[xen-4.14-testing test] 168013: tolerable FAIL - PUSHED

2022-02-04 Thread osstest service owner
flight 168013 xen-4.14-testing real [real]
flight 168019 xen-4.14-testing real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/168013/
http://logs.test-lab.xenproject.org/osstest/logs/168019/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-i386-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail pass in 
168019-retest
 test-arm64-arm64-libvirt-raw 12 debian-di-install   fail pass in 168019-retest

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-libvirt-raw 14 migrate-support-check fail in 168019 never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-check fail in 168019 never 
pass
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 167964
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 167964
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 167964
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 167964
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 167964
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 167964
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 167964
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 167964
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 167964
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 167964
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 167964
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 167964
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 

[linux-linus test] 168012: tolerable FAIL - PUSHED

2022-02-04 Thread osstest service owner
flight 168012 linux-linus real [real]
flight 168020 linux-linus real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/168012/
http://logs.test-lab.xenproject.org/osstest/logs/168020/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-credit1  23 guest-start.2   fail pass in 168020-retest

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 168004
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 168004
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 168004
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 168004
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 168004
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 168004
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 168004
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 168004
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass

version targeted for testing:
 linuxdcb85f85fa6f142aae1fe86f399d4503d49f2b60
baseline version:
 linux1f2cfdd349b7647f438c1e552dc1b983da86d830

Last test of basis   168004  2022-02-04 00:41:22 Z1 days
Testing same since   168012  2022-02-04 16:57:25 Z0 days1 attempts


People who touched revisions under test:
  Alex Elder 
  Alexander Aring 
  Alexandre Belloni 
  Alexei Starovoitov 
  Andrii Nakryiko 
  Arınç ÜNAL 
  Camel Guo 
  Christian Brauner 
  

Re: x86: insn-eval.c's use of native_store_gdt()

2022-02-04 Thread Ricardo Neri
On Fri, Feb 04, 2022 at 03:13:52PM +0100, Jan Beulich wrote:
> On 04.02.2022 15:08, Thomas Gleixner wrote:
> > On Fri, Feb 04 2022 at 10:23, Jan Beulich wrote:
> >> On 30.11.2021 12:09, Jan Beulich wrote:
> >>> I think it is b968e84b509d ("x86/iopl: Fake iopl(3) CLI/STI usage")
> >>> which uncovered an issue with get_desc() trying to access the GDT, as
> >>> introduced by 670f928ba09b ("x86/insn-eval: Add utility function to
> >>> get segment descriptor"). When running in a PV domain under Xen, the
> >>> (hypervisor's) GDT isn't accessible; with UMIP enabled by Xen even
> >>> SGDT wouldn't work, as the kernel runs in ring 3.
> >>>
> >>> There's currently no hypercall to retrieve a descriptor from the GDT.
> >>> It is instead assumed that kernels know where their present GDT
> >>> lives. Can the native_store_gdt() be replaced there, please?
> >>>
> >>> For context (I don't think it should matter much here) I'm observing
> >>> this with the kernel put underneath a rather old distro, where
> >>> hwclock triggers this path.
> >>
> >> I'd like to note that the issue still exists in 5.16.
> > 
> > I'd like to note, that I've seen no patches to that effect.
> 
> I could have worded it that way as well, yes.

Hi Jan, I am sorry I missed your email. I'll look at the issue you
describe and get back to you.

Thanks and BR,
Ricardo



Re: [PATCH v3 1/2] dt-bindings: arm: xen: document Xen iommu device

2022-02-04 Thread Rob Herring
On Wed, Jan 26, 2022 at 10:56:39AM -0800, Stefano Stabellini wrote:
> On Wed, 26 Jan 2022, Robin Murphy wrote:
> > On 2022-01-26 15:09, Sergiy Kibrik wrote:
> > > Hi Robin,
> > > 
> > > > 
> > > > This could break Linux guests, since depending on the deferred probe
> > > > timeout setting it could lead to drivers never probing because the 
> > > > "IOMMU"
> > > > never becomes available.
> > > > 
> > > 
> > > I've noticed no deferred probe timeouts when booting with this patch. 
> > > Could
> > > you please explain more on how this would break guests?
> > 
> > Right now I think it would actually require command-line intervention, e.g.
> > "fw_devlink=on" or "deferred_probe_timeout=3600" (with modules enabled for 
> > the
> > latter to take full effect), but I'm wary of the potential for future config
> > options to control those behaviours by default.

fw_devlink=on is now the default (for at least a couple of cycles).

> 
> If deferred_probe_timeout=3600 was specified, we would just need an
> IOMMU driver in Linux for the "xen,iommu-el2-v1" node to solve the
> problem, right? I guess I am trying to say that it wouldn't be a device
> tree interface problem but rather a Linux implementation discussion.

You would have to add that IOMMU driver to old, existing kernels if you 
want compatibility with a new DT. Otherwise, that kernel would stop 
booting with a new DT.

Rob




[xen-unstable test] 168008: tolerable FAIL

2022-02-04 Thread osstest service owner
flight 168008 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168008/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-credit1  22 guest-start/debian.repeat  fail pass in 168001

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 168001
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 168001
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 168001
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 168001
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 168001
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 168001
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 168001
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 168001
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 168001
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 168001
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 168001
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 168001
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass

version 

Re: Xen data from meta-virtualization layer

2022-02-04 Thread Michael Walle

Hi Julien,

Am 2022-02-05 00:29, schrieb Julien Grall:
[..]

But not a very user friendly one, though. I guess the first UART
is disabled/removed by Xen? I haven't looked at how it is handled.

Can't we search for other uarts with the same interrupt and disable
these, too? Maybe conditionally by the SoC compatible?


The problem sounds quite similar to the one we had on sunxi. Although
the UART was on the same page rather than sharing interrupts.

Xen has per-platform hook to prevent a device been assigned
to dom0. For an example, you could look at:

https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/arm/platforms/sunxi.c


Nice. At least, this is working now ;) I'll send a patch in
the next few days.

I guess it's safe to assume that we can always remove both UARTs
on the LS1028A (probably on most layerscapes). The most common
use case is to use the first UART for Xen. You could run Xen
without console (?), then you'd miss the possibility to map
the DUART. Or there could be a new driver for the LPUART on the
LS1028A. In this case, the DUART wouldn't be used by Xen either.

But I think we should start simple and just remove the DUART
altogether via that hook.

-michael



Re: Xen data from meta-virtualization layer

2022-02-04 Thread Julien Grall
Hi Michael,

On Fri, 4 Feb 2022 at 22:42, Michael Walle  wrote:
> Am 2022-02-04 22:11, schrieb Stefano Stabellini:
> > On Fri, 4 Feb 2022, Michael Walle wrote:
> >> > In regards to the reserved-memory regions, maybe we are not seeing them
> >> > because Leo posted the host device tree, not the one passed at runtime
> >> > from u-boot to Linux?
> >> >
> >> > If so, Leo, could you please boot Linux on native (no Xen) and get the
> >> > device tree from there at runtime using dtc -I fs -O dts
> >> > /proc/device-tree ?
> >> >
> >> >
> >> > However, the name of the reserved-memory region created by u-boot seems
> >> > to be "lpi_rd_table". I cannot find any mentions of lpi_rd_table in the
> >> > Linux kernel tree either.
> >> >
> >> > Zhiqiang, Leo is trying to boot Xen on sAL28. Linux booting on Xen
> >> > throws errors in regards to GIC/ITS initialization. On other hardware
> >> > Xen can use and virtualize GICv3 and ITS just fine. Could you please
> >> > explain what is different about sAL28 and how Xen/Linux is expected to
> >> > use the lpi_rd_table reserved-memory region?
> >>
> >> I actually stumbled across this thread after trying out Xen myself.
> >> I'm
> >> using lastest vanilla u-boot (with pending PSCI patches), vanilla
> >> kernel
> >> and vanilla Xen.
> >>
> >> So far I've discovered, that xen complains that it cannot route IRQ64
> >> to
> >> dom0. That is because on the LS1028A there is a dual UART (two 16550
> >> with one shared interrupt) and xen takes the first UART and then tries
> >> to map the interrupt of the second UART to linux. For now, I don't
> >> know
> >> how this is solved correctly. As a quick hack, I removed the second
> >> uart
> >> node from the device tree.
> >
> > This is an interesting problem. Removing the second UART is a good
> > workaround for now as there is no obvious solution I think.
>
> But not a very user friendly one, though. I guess the first UART
> is disabled/removed by Xen? I haven't looked at how it is handled.
>
> Can't we search for other uarts with the same interrupt and disable
> these, too? Maybe conditionally by the SoC compatible?

The problem sounds quite similar to the one we had on sunxi. Although
the UART was on the same page rather than sharing interrupts.

Xen has per-platform hook to prevent a device been assigned
to dom0. For an example, you could look at:

https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/arm/platforms/sunxi.c

> >> This is the first developer experience with Xen, so please bear with
> >> me
> >> :) It seems that Xen doesn't add the master to the IOMMU. To me it
> >> seems
> >> that only devices with a 'iommus' dt property are added. But in case
> >> of
> >> PCI devices the parent only has a iommu-map property.
> >>
> >> And it makes me wonder why Leo has an almost working setup. Maybe I'm
> >> missing some patches though.
> >
> > Xen 4.16 is able to parse StreamID in the "iommus" property and also
> > "mmu-masters" property. But It is not able to parse the "iommu-map"
> > property yet. So if 0x42a is described in device tree using "iommu-map"
> > then the error makes sense.
> >
> > A simple solution is to replace iommu-map with iommus in device tree.
>
> I'm not sure this is so easy, because they are dynamically assigned
> by the bootloader. Sure for now I could do that I guess, but iommu=0
> works as well ;)
>
> I now got Xen and Linux booting and see the same problems with the
> GIC ITS, that is that the enetc interrupts aren't delivered to the
> dom0 linux. I've also applied the patch in this thread and I'm
> seeing the same as Leo. Full boot log is here [1].
>
> I noticed the following.
> [0.168544] pci :00:00.0: Failed to add - passthrough or
> MSI/MSI-X might fail!

This message is harmless. This is printed because Xen on Arm
doesn't hypercall the hypercall to add a PCI device. On Arm,
we don't need it yet (it might be necessary for PCI passthrough) and
MSI/MSI-X are handled/registered the same way as on bare-metal
(for your case through the ITS)

>
> Not sure if it should work nonetheless.

I looked through the log and couldn't spot anything obvious. However,
skimming through Linux, I noticed there are some changes in the
ITS for freescale (now NXP) such as:

drivers/irqchip/irq-gic-v3-its-fsl-mc-msi.c

Is it something that might be used on the board you are using?

If the answer is yes, then my suggestion would be to look
how this is meant to interact with the ITS. It might be possible
that we are missing some pieces in Xen to properly support it.

>
> > It is possible that someone in CC to this email might already have a
> > patch to introduce parsing of iommu-map in Xen.

Pass. I don't have any and couldn't find any submission on the ML.


>
> I guess they've used the old mmu-masters property.
>
> Btw. I don't know if it matters, but the SMARC-sAL28 normally doesn't
> use TF-A and runs without it. Nonetheless, I've booted the board with
> the bl31 from NXP and it doesn't help either. There is 

Re: Xen data from meta-virtualization layer

2022-02-04 Thread Michael Walle

Hi Stefano,

Am 2022-02-04 22:11, schrieb Stefano Stabellini:

On Fri, 4 Feb 2022, Michael Walle wrote:

> In regards to the reserved-memory regions, maybe we are not seeing them
> because Leo posted the host device tree, not the one passed at runtime
> from u-boot to Linux?
>
> If so, Leo, could you please boot Linux on native (no Xen) and get the
> device tree from there at runtime using dtc -I fs -O dts
> /proc/device-tree ?
>
>
> However, the name of the reserved-memory region created by u-boot seems
> to be "lpi_rd_table". I cannot find any mentions of lpi_rd_table in the
> Linux kernel tree either.
>
> Zhiqiang, Leo is trying to boot Xen on sAL28. Linux booting on Xen
> throws errors in regards to GIC/ITS initialization. On other hardware
> Xen can use and virtualize GICv3 and ITS just fine. Could you please
> explain what is different about sAL28 and how Xen/Linux is expected to
> use the lpi_rd_table reserved-memory region?

I actually stumbled across this thread after trying out Xen myself. 
I'm
using lastest vanilla u-boot (with pending PSCI patches), vanilla 
kernel

and vanilla Xen.

So far I've discovered, that xen complains that it cannot route IRQ64 
to

dom0. That is because on the LS1028A there is a dual UART (two 16550
with one shared interrupt) and xen takes the first UART and then tries
to map the interrupt of the second UART to linux. For now, I don't 
know
how this is solved correctly. As a quick hack, I removed the second 
uart

node from the device tree.


This is an interesting problem. Removing the second UART is a good
workaround for now as there is no obvious solution I think.


But not a very user friendly one, though. I guess the first UART
is disabled/removed by Xen? I haven't looked at how it is handled.

Can't we search for other uarts with the same interrupt and disable
these, too? Maybe conditionally by the SoC compatible?


But what is more severe is that the iommu isn't set up correctly. I'm
getting the following faults:

(XEN) smmu: /soc/iommu@500: Unexpected global fault, this could be 
serious
(XEN) smmu: /soc/iommu@500: 	GFSR 0x8002, GFSYNR0 0x, 
GFSYNR1 0x042a, GFSYNR2 0x


If I decode it correctly, the streamid should be 0x2a which would be 
one

of the PCI devices on the internal root complex. Probably the network
card.


Yes there is DMA transaction with an "unknown" StreamID. I think the
StreamID is 0x42a. It means that there is a DMA master on the board 
with

StreamID 0x42a that is either:

- not described in device tree
- described in device tree with a different StreamID
- the right StreamID is described device tree, but it is not picked up
  by Xen


See below.

This is the first developer experience with Xen, so please bear with 
me
:) It seems that Xen doesn't add the master to the IOMMU. To me it 
seems
that only devices with a 'iommus' dt property are added. But in case 
of

PCI devices the parent only has a iommu-map property.

And it makes me wonder why Leo has an almost working setup. Maybe I'm
missing some patches though.


Xen 4.16 is able to parse StreamID in the "iommus" property and also
"mmu-masters" property. But It is not able to parse the "iommu-map"
property yet. So if 0x42a is described in device tree using "iommu-map"
then the error makes sense.

A simple solution is to replace iommu-map with iommus in device tree.


I'm not sure this is so easy, because they are dynamically assigned
by the bootloader. Sure for now I could do that I guess, but iommu=0
works as well ;)

I now got Xen and Linux booting and see the same problems with the
GIC ITS, that is that the enetc interrupts aren't delivered to the
dom0 linux. I've also applied the patch in this thread and I'm
seeing the same as Leo. Full boot log is here [1].

I noticed the following.
[0.168544] pci :00:00.0: Failed to add - passthrough or 
MSI/MSI-X might fail!


Not sure if it should work nonetheless.


It is possible that someone in CC to this email might already have a
patch to introduce parsing of iommu-map in Xen.


I guess they've used the old mmu-masters property.

Btw. I don't know if it matters, but the SMARC-sAL28 normally doesn't
use TF-A and runs without it. Nonetheless, I've booted the board with
the bl31 from NXP and it doesn't help either. There is still a
difference between the NXP bootflow which uses bl1/bl2/bl31/u-boot
and this board which uses u-boot-spl/u-boot or u-boot-spl/bl31/u-boot.

I just found GIC setup code in the bl31. I'll also have a look there.

-michael

[1] https://pastebin.com/raw/XMjE3BvG



Re: [PATCH 04/16] x86/P2M: move map_domain_gfn() (again)

2022-02-04 Thread George Dunlap
On Mon, Jul 5, 2021 at 5:07 PM Jan Beulich  wrote:

> The main user is the guest walking code, so move it back there; commit
> 9a6787cc3809 ("x86/mm: build map_domain_gfn() just once") would perhaps
> better have kept it there in the first place. This way it'll only get
> built when it's actually needed (and still only once).
>
> This also eliminates one more CONFIG_HVM conditional from p2m.c.
>
> Signed-off-by: Jan Beulich 
>

Reviewed-by: George Dunlap 


Re: [PATCH 03/16] x86/P2M: drop a few CONFIG_HVM

2022-02-04 Thread George Dunlap
On Mon, Jul 5, 2021 at 5:06 PM Jan Beulich  wrote:

> This is to make it easier to see which parts of p2m.c still aren't HVM-
> specific: In one case the conditionals sat in an already guarded region,
> while in the other case P2M_AUDIT implies HVM.
>

I think this would be much more easy to understand what's going on if it
was more like this:

---
x86/p2m: P2M_AUDIT implies CONFIG_HVM

Remove one #endif / #ifdef CONFIG_HVM pair to make all the audit code
CONFIG_HVM only.  This is to make it easier to see which parts of p2m.c
still aren't HVM-specific.

While here, remove a redundant set of nested #ifdef CONFIG_HVM.
---

Reviewed-by: George Dunlap 


Re: [PATCH 02/16] x86/P2M: introduce p2m_{add,remove}_page()

2022-02-04 Thread George Dunlap
On Mon, Jul 5, 2021 at 5:06 PM Jan Beulich  wrote:

> p2m_add_page() is simply a rename from guest_physmap_add_entry().
> p2m_remove_page() then is its counterpart, despite rendering
> guest_physmap_remove_page(). This way callers can use suitable pairs of
> functions (previously violated by hvm/grant_table.c).
>

Obviously this needs some clarification.  While we're here, I find this a
bit confusing; I tend to use the present tense for the way the code is
before the patch, and the imperative for what the patch does; so Id' say:

Rename guest_physmap_add_entry() to p2m_add_page; make
guest_physmap_remove_page a wrapper with p2m_remove_page.  That way callers
can use suitable pairs...

Other than that looks good.

 -George


Re: [PATCH 01/16] x86/P2M: rename p2m_remove_page()

2022-02-04 Thread George Dunlap
On Mon, Jul 5, 2021 at 5:05 PM Jan Beulich  wrote:

> This is in preparation to re-using the original name.
>
> Signed-off-by: Jan Beulich 
>

Hey Jan,

This series overall looks good; thanks for taking this on.

Functionally this patch looks good; just one question...

--- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -788,8 +788,8 @@ void p2m_final_teardown(struct domain *d
>  #ifdef CONFIG_HVM
>
>  static int __must_check
> -p2m_remove_page(struct p2m_domain *p2m, gfn_t gfn, mfn_t mfn,
> -unsigned int page_order)
> +p2m_remove_entry(struct p2m_domain *p2m, gfn_t gfn, mfn_t mfn,
> + unsigned int page_order)
>

One question that's naturally raised for both this and the following patch
is, what is the new naming "scheme" for these renamed functions, and how do
they relate to the old scheme?

Overall it seems like the intention is that "guest_physmap_..." can be
called on a domain which may be PV or HVM, while "p2m_..." should only be
called on HVM domains.

There's also "..._entry" vs "..._page".  Is the p2m_remove_page /
p2m_remove_entry distinction have a meaning, and is it the same meaning as
guest_physmap_add_page / guest_physmap_add_entry?  Or is it similar to
p2m_init_nestedp2m / p2m_nestedp2m_init -- we need both functions and
don't want to make the names longer?

 -George


Re: Xen data from meta-virtualization layer

2022-02-04 Thread Stefano Stabellini
On Fri, 4 Feb 2022, Michael Walle wrote:
> > In regards to the reserved-memory regions, maybe we are not seeing them
> > because Leo posted the host device tree, not the one passed at runtime
> > from u-boot to Linux?
> > 
> > If so, Leo, could you please boot Linux on native (no Xen) and get the
> > device tree from there at runtime using dtc -I fs -O dts
> > /proc/device-tree ?
> > 
> > 
> > However, the name of the reserved-memory region created by u-boot seems
> > to be "lpi_rd_table". I cannot find any mentions of lpi_rd_table in the
> > Linux kernel tree either.
> > 
> > Zhiqiang, Leo is trying to boot Xen on sAL28. Linux booting on Xen
> > throws errors in regards to GIC/ITS initialization. On other hardware
> > Xen can use and virtualize GICv3 and ITS just fine. Could you please
> > explain what is different about sAL28 and how Xen/Linux is expected to
> > use the lpi_rd_table reserved-memory region?
> 
> I actually stumbled across this thread after trying out Xen myself. I'm
> using lastest vanilla u-boot (with pending PSCI patches), vanilla kernel
> and vanilla Xen.
> 
> So far I've discovered, that xen complains that it cannot route IRQ64 to
> dom0. That is because on the LS1028A there is a dual UART (two 16550
> with one shared interrupt) and xen takes the first UART and then tries
> to map the interrupt of the second UART to linux. For now, I don't know
> how this is solved correctly. As a quick hack, I removed the second uart
> node from the device tree.

This is an interesting problem. Removing the second UART is a good
workaround for now as there is no obvious solution I think.


> But what is more severe is that the iommu isn't set up correctly. I'm
> getting the following faults:
> 
> (XEN) smmu: /soc/iommu@500: Unexpected global fault, this could be serious
> (XEN) smmu: /soc/iommu@500:   GFSR 0x8002, GFSYNR0 0x, 
> GFSYNR1 0x042a, GFSYNR2 0x
> 
> If I decode it correctly, the streamid should be 0x2a which would be one
> of the PCI devices on the internal root complex. Probably the network
> card.

Yes there is DMA transaction with an "unknown" StreamID. I think the
StreamID is 0x42a. It means that there is a DMA master on the board with
StreamID 0x42a that is either:

- not described in device tree
- described in device tree with a different StreamID
- the right StreamID is described device tree, but it is not picked up
  by Xen


> This is the first developer experience with Xen, so please bear with me
> :) It seems that Xen doesn't add the master to the IOMMU. To me it seems
> that only devices with a 'iommus' dt property are added. But in case of
> PCI devices the parent only has a iommu-map property.
> 
> And it makes me wonder why Leo has an almost working setup. Maybe I'm
> missing some patches though.

Xen 4.16 is able to parse StreamID in the "iommus" property and also
"mmu-masters" property. But It is not able to parse the "iommu-map"
property yet. So if 0x42a is described in device tree using "iommu-map"
then the error makes sense.

A simple solution is to replace iommu-map with iommus in device tree.
It is possible that someone in CC to this email might already have a
patch to introduce parsing of iommu-map in Xen.



[PATCH] xen/smp: Speed up on_selected_cpus()

2022-02-04 Thread Andrew Cooper
cpumask_weight() is a horribly expensive way to find if no bits are set, made
worse by the fact that the calculation is performed with the global call_lock
held.

Switch to using cpumask_empty() instead, which will short circuit as soon as
it find any set bit in the cpumask.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Juergen Gross 
CC: Stefano Stabellini 
CC: Julien Grall 
CC: Volodymyr Babchuk 
CC: Bertrand Marquis 

I have not done performance testing, but I would be surprised if this cannot
be measured on a busy or large box.
---
 xen/common/smp.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/xen/common/smp.c b/xen/common/smp.c
index 781bcf2c246c..a011f541f1ea 100644
--- a/xen/common/smp.c
+++ b/xen/common/smp.c
@@ -50,8 +50,6 @@ void on_selected_cpus(
 void *info,
 int wait)
 {
-unsigned int nr_cpus;
-
 ASSERT(local_irq_is_enabled());
 ASSERT(cpumask_subset(selected, _online_map));
 
@@ -59,8 +57,7 @@ void on_selected_cpus(
 
 cpumask_copy(_data.selected, selected);
 
-nr_cpus = cpumask_weight(_data.selected);
-if ( nr_cpus == 0 )
+if ( cpumask_empty(_data.selected) )
 goto out;
 
 call_data.func = func;
-- 
2.11.0




Re: [PATCH v2] docs: document patch rules

2022-02-04 Thread Julien Grall

Hi,

On 03/02/2022 12:54, Juergen Gross wrote:

+## The commit message
+
+The commit message is free text describing *why* the patch is done and
+*how* the goal of the patch is achieved. A good commit message will describe
+the current situation, the desired goal, and the way this goal is being
+achieved. Parts of that can be omitted in obvious cases.
+
+In case additional changes are done in the patch (like e.g. cleanups), those
+should be mentioned.
+
+When referencing other patches (e.g. `similar to patch xy ...`) those
+patches should be referenced via their commit id (at least 12 digits)
+and the patch subject, if the very same patch isn't referenced by the
+`Fixes:` tag, too:
+
+Similar to commit 67d01cdb5518 ("x86: infrastructure to allow converting
+certain indirect calls to direct ones") add ...
+
+The following ``git config`` settings can be used to add a pretty format for
+outputting the above style in the ``git log`` or ``git show`` commands:
+
+[core]
+abbrev = 12
+[pretty]
+fixes = Fixes: %h (\"%s\")
+
+Lines in the commit message should not exceed 75 characters, except when


I was under the impression that commit message should be wrap to 72 
characters. This is because tools like "git log" would indent the commit 
message by 8 characters.



+copying error output directly into the commit message.
+
+## Tags
+
+Tags are entries in the form
+
+Tag: something
+
+In general tags are added in chronological order. So a `Reviewed-by:` tag
+should be added **after** the `Signed-off-by:` tag, as the review happened
+after the patch was written.
+
+Do not split a tag across multiple lines, tags are exempt from the
+"wrap at 75 columns" rule in order to simplify parsing scripts.


This would need to be adjusted depending on the answer above.


+
+### Origin:
+
+Xen has inherited some source files from other open source projects. In case
+a patch modifying such an inherited file is taken from that project (maybe in
+modified form), the `Origin:` tag specifies the source of the patch:
+
+Origin:  


NIT: Likes you did for Fixes tags, can you make clear that the commit id 
should be the first 12 characters? So the line...



+
+E.g.:
+
+Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
f093b08c47b3


... doesn't get horribly long.


+
+All tags **above** the `Origin:` tag are from the original patch (which
+should all be kept), while tags **after** `Origin:` are related to the
+normal Xen patch process as described here.


Cheers,

--
Julien Grall



[xen-unstable-smoke test] 168011: tolerable all pass - PUSHED

2022-02-04 Thread osstest service owner
flight 168011 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168011/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  820cc393434097f3b7976acdccbf1d96071d6d23
baseline version:
 xen  75cc460a1b8cfd8e5d2c4302234ee194defe4872

Last test of basis   167999  2022-02-03 13:01:51 Z1 days
Testing same since   168011  2022-02-04 16:01:44 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Jan Beulich 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   75cc460a1b..820cc39343  820cc393434097f3b7976acdccbf1d96071d6d23 -> smoke



[libvirt test] 168006: regressions - FAIL

2022-02-04 Thread osstest service owner
flight 168006 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168006/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-amd64-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-arm64-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-i386-libvirt6 libvirt-buildfail REGR. vs. 151777

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt-raw   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-qcow2  1 build-check(1)   blocked  n/a

version targeted for testing:
 libvirt  bf36dcb2a6272a1ea7eb770878df2cbd455a6679
baseline version:
 libvirt  2c846fa6bcc11929c9fb857a22430fb9945654ad

Last test of basis   151777  2020-07-10 04:19:19 Z  574 days
Failing since151818  2020-07-11 04:18:52 Z  573 days  555 attempts
Testing same since   168006  2022-02-04 04:18:55 Z0 days1 attempts


People who touched revisions under test:
Adolfo Jayme Barrientos 
  Aleksandr Alekseev 
  Aleksei Zakharov 
  Andika Triwidada 
  Andrea Bolognani 
  Ani Sinha 
  Balázs Meskó 
  Barrett Schonefeld 
  Bastian Germann 
  Bastien Orivel 
  BiaoXiang Ye 
  Bihong Yu 
  Binfeng Wu 
  Bjoern Walk 
  Boris Fiuczynski 
  Brad Laue 
  Brian Turek 
  Bruno Haible 
  Chris Mayo 
  Christian Borntraeger 
  Christian Ehrhardt 
  Christian Kirbach 
  Christian Schoenebeck 
  Christophe Fergeau 
  Cole Robinson 
  Collin Walling 
  Cornelia Huck 
  Cédric Bosdonnat 
  Côme Borsoi 
  Daniel Henrique Barboza 
  Daniel Letai 
  Daniel P. Berrange 
  Daniel P. Berrangé 
  Didik Supriadi 
  dinglimin 
  Divya Garg 
  Dmitrii Shcherbakov 
  Dmytro Linkin 
  Eiichi Tsukata 
  Emilio Herrera 
  Eric Farman 
  Erik Skultety 
  Fabian Affolter 
  Fabian Freyer 
  Fabiano Fidêncio 
  Fangge Jin 
  Farhan Ali 
  Fedora Weblate Translation 
  Franck Ridel 
  Gavi Teitz 
  gongwei 
  Guoyi Tu
  Göran Uddeborg 
  Halil Pasic 
  Han Han 
  Hao Wang 
  Hela Basa 
  Helmut Grohne 
  Hiroki Narukawa 
  Hyman Huang(黄勇) 
  Ian Wienand 
  Ioanna Alifieraki 
  Ivan Teterevkov 
  Jakob Meng 
  Jamie Strandboge 
  Jamie Strandboge 
  Jan Kuparinen 
  jason lee 
  Jean-Baptiste Holcroft 
  Jia Zhou 
  Jianan Gao 
  Jim Fehlig 
  Jin Yan 
  Jinsheng Zhang 
  Jiri Denemark 
  Joachim Falk 
  John Ferlan 
  Jonathan Watt 
  Jonathon Jongsma 
  Julio Faracco 
  Justin Gatzen 
  Ján Tomko 
  Kashyap Chamarthy 
  Kevin Locke 
  Koichi Murase 
  Kristina Hanicova 
  Laine Stump 
  Laszlo Ersek 
  Lee Yarwood 
  Lei Yang 
  Liao Pingfang 
  Lin Ma 
  Lin Ma 
  Lin Ma 
  Liu Yiding 
  Lubomir Rintel 
  Luke Yue 
  Luyao Zhong 
  Marc Hartmayer 
  Marc-André Lureau 
  Marek Marczykowski-Górecki 
  Markus Schade 
  Martin Kletzander 
  Masayoshi Mizuma 
  Matej Cepl 
  Matt Coleman 
  Matt Coleman 
  Mauro Matteo Cascella 
  Meina Li 
  Michal Privoznik 
  Michał Smyk 
  Milo Casagrande 
  Moshe Levi 
  Muha Aliss 
  Nathan 
  Neal Gompa 
  Nick Chevsky 
  Nick Shyrokovskiy 
  Nickys Music Group 
  Nico Pache 
  Nicolas Lécureuil 
  Nicolas Lécureuil 
  Nikolay Shirokovskiy 
  Olaf Hering 
  Olesya Gerasimenko 
  Or Ozeri 
  Orion Poplawski 
  Pany 
  Patrick Magauran 
  Paulo de Rezende Pinatti 
  Pavel Hrdina 
  Peng Liang 
  Peter Krempa 
  Pino Toscano 
  Pino Toscano 
  Piotr Drąg 
  Prathamesh Chavan 
  Praveen K Paladugu 
  Richard W.M. Jones 
  Ricky Tigg 
  Robin Lee 
  Rohit Kumar 
  Roman Bogorodskiy 
  Roman Bolshakov 
  Ryan Gahagan 
  Ryan Schmidt 
  Sam Hartman 
  Scott Shambarger 
  Sebastian Mitterle 
  SeongHyun Jo 
  Shalini Chellathurai Saroja 
  Shaojun Yang 
  

[PATCH] x86/hvm: Fix boot on systems where HVM isn't available

2022-02-04 Thread Andrew Cooper
c/s 27a63cdac388 ("x86/HVM: convert remaining hvm_funcs hook invocations to
alt-call") went too far with dropping NULL function pointer checks.

smp_callin() calls hvm_cpu_up() unconditionally.  When the platform doesn't
support HVM, hvm_enable() exits without filling in hvm_funcs, after which the
altcall pass nukes the (now unconditional) indirect call, causing:

  (XEN) [ Xen-4.17.0-10.18-d  x86_64  debug=y  Not tainted ]
  (XEN) CPU:1
  (XEN) RIP:e008:[] start_secondary+0x393/0x3b7
  (XEN) RFLAGS: 00010086   CONTEXT: hypervisor
  ...
  (XEN) Xen code around  (start_secondary+0x393/0x3b7):
  (XEN)  ff ff 8b 05 1b 84 17 00 <0f> 0b 0f ff ff 90 89 c3 85 c0 0f 84 db fe ff 
ff
  ...
  (XEN) Xen call trace:
  (XEN)[] R start_secondary+0x393/0x3b7
  (XEN)[] F __high_start+0x42/0x60

To make matters worse, __stop_this_cpu() calls hvm_cpu_down() unconditionally
too, so what happen next is:

  (XEN) [ Xen-4.17.0-10.18-d  x86_64  debug=y  Not tainted ]
  (XEN) CPU:0
  (XEN) RIP:e008:[] __stop_this_cpu+0x12/0x3c
  (XEN) RFLAGS: 00010046   CONTEXT: hypervisor
  ...
  (XEN) Xen code around  (__stop_this_cpu+0x12/0x3c):
  (XEN)  48 89 e5 e8 8a 1d fd ff <0f> 0b 0f ff ff 90 0f 06 db e3 48 89 e0 48 0d 
ff
  ...
  (XEN) Xen call trace:
  (XEN)[] R __stop_this_cpu+0x12/0x3c
  (XEN)[] F smp_send_stop+0xdd/0xf8
  (XEN)[] F machine_restart+0xa2/0x298
  (XEN)[] F arch/x86/shutdown.c#__machine_restart+0xb/0x11
  (XEN)[] F smp_call_function_interrupt+0xbf/0xea
  (XEN)[] F call_function_interrupt+0x35/0x37
  (XEN)[] F do_IRQ+0xa3/0x6b5
  (XEN)[] F common_interrupt+0x10a/0x120
  (XEN)[] F __udelay+0x3a/0x51
  (XEN)[] F __cpu_up+0x48f/0x734
  (XEN)[] F cpu_up+0x7d/0xde
  (XEN)[] F __start_xen+0x200b/0x2618
  (XEN)[] F __high_start+0x4f/0x60

which recurses until hitting a stack overflow.  The #DF handler, which resets
its stack on each invocation, loops indefinitely.

Reinstate the NULL function pointer checks for hvm_cpu_{up,down}().

Fixes: 27a63cdac388 ("x86/HVM: convert remaining hvm_funcs hook invocations to 
alt-call")
Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 

RFC.  Not tested yet on the imacted hardware.  It's a Xeon PHI with another
werid thing in need of debugging.  First boot is fine, while second
boot (loading microcode this time) has a problem with vmx.

I wonder if we want to modify the callers to check for HVM being enabled,
rather than leaving the NULL pointer checks in a position where they're liable
to be reaped again.

Also, the #UD handler really should identify 0f 0b 0f ff ff as the
clobbered-altcall sequence, and provide a message to that effect.
---
 xen/arch/x86/include/asm/hvm/hvm.h | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/include/asm/hvm/hvm.h 
b/xen/arch/x86/include/asm/hvm/hvm.h
index a441cbc22159..2dd08567f730 100644
--- a/xen/arch/x86/include/asm/hvm/hvm.h
+++ b/xen/arch/x86/include/asm/hvm/hvm.h
@@ -553,12 +553,16 @@ static inline void hvm_invlpg(struct vcpu *v, unsigned 
long linear)
 
 static inline int hvm_cpu_up(void)
 {
-return alternative_call(hvm_funcs.cpu_up);
+if ( hvm_funcs.cpu_up )
+return alternative_call(hvm_funcs.cpu_up);
+
+return 0;
 }
 
 static inline void hvm_cpu_down(void)
 {
-alternative_vcall(hvm_funcs.cpu_down);
+if ( hvm_funcs.cpu_down )
+alternative_vcall(hvm_funcs.cpu_down);
 }
 
 static inline unsigned int hvm_get_insn_bytes(struct vcpu *v, uint8_t *buf)
-- 
2.11.0




[linux-linus test] 168004: tolerable FAIL - PUSHED

2022-02-04 Thread osstest service owner
flight 168004 linux-linus real [real]
flight 168010 linux-linus real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/168004/
http://logs.test-lab.xenproject.org/osstest/logs/168010/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-freebsd12-amd64 19 guest-localmigrate/x10 fail pass in 
168010-retest

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 167988
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 167988
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 167988
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 167988
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 167988
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 167988
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 167988
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 167988
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass

version targeted for testing:
 linux1f2cfdd349b7647f438c1e552dc1b983da86d830
baseline version:
 linux27bb0b18c208ecd4c0deda6aad28616d73e4133d

Last test of basis   167988  2022-02-02 18:11:17 Z1 days
Failing since167993  2022-02-03 04:34:02 Z1 days2 attempts
Testing same since   168004  2022-02-04 00:41:22 Z0 days1 attempts


People who touched revisions under test:
  "Eric W. Biederman" 
  Chuck Lever 
  Dai Ngo 
  Dan Carpenter 
  

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-04 Thread Roger Pau Monné
On Fri, Feb 04, 2022 at 02:43:07PM +, Oleksandr Andrushchenko wrote:
> 
> 
> On 04.02.22 15:06, Roger Pau Monné wrote:
> > On Fri, Feb 04, 2022 at 12:53:20PM +, Oleksandr Andrushchenko wrote:
> >>
> >> On 04.02.22 14:47, Jan Beulich wrote:
> >>> On 04.02.2022 13:37, Oleksandr Andrushchenko wrote:
>  On 04.02.22 13:37, Jan Beulich wrote:
> > On 04.02.2022 12:13, Roger Pau Monné wrote:
> >> On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote:
> >>> On 04.02.2022 11:12, Oleksandr Andrushchenko wrote:
>  On 04.02.22 11:15, Jan Beulich wrote:
> > On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
> >> On 04.02.22 09:52, Jan Beulich wrote:
> >>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
>  @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev 
>  *pdev, uint16_t cmd, bool rom_only)
>    continue;
>    }
>    
>  +spin_lock(>vpci_lock);
>  +if ( !tmp->vpci )
>  +{
>  +spin_unlock(>vpci_lock);
>  +continue;
>  +}
>    for ( i = 0; i < 
>  ARRAY_SIZE(tmp->vpci->header.bars); i++ )
>    {
>    const struct vpci_bar *bar = 
>  >vpci->header.bars[i];
>  @@ -303,12 +310,14 @@ static int modify_bars(const struct 
>  pci_dev *pdev, uint16_t cmd, bool rom_only)
>    rc = rangeset_remove_range(mem, start, end);
>    if ( rc )
>    {
>  +spin_unlock(>vpci_lock);
>    printk(XENLOG_G_WARNING "Failed to remove 
>  [%lx, %lx]: %d\n",
>   start, end, rc);
>    rangeset_destroy(mem);
>    return rc;
>    }
>    }
>  +spin_unlock(>vpci_lock);
>    }
> >>> At the first glance this simply looks like another unjustified 
> >>> (in the
> >>> description) change, as you're not converting anything here but 
> >>> you
> >>> actually add locking (and I realize this was there before, so I'm 
> >>> sorry
> >>> for not pointing this out earlier).
> >> Well, I thought that the description already has "...the lock can 
> >> be
> >> used (and in a few cases is used right away) to check whether vpci
> >> is present" and this is enough for such uses as here.
> >>>   But then I wonder whether you
> >>> actually tested this, since I can't help getting the impression 
> >>> that
> >>> you're introducing a live-lock: The function is called from 
> >>> cmd_write()
> >>> and rom_write(), which in turn are called out of vpci_write(). 
> >>> Yet that
> >>> function already holds the lock, and the lock is not (currently)
> >>> recursive. (For the 3rd caller of the function - init_bars() - 
> >>> otoh
> >>> the locking looks to be entirely unnecessary.)
> >> Well, you are correct: if tmp != pdev then it is correct to acquire
> >> the lock. But if tmp == pdev and rom_only == true
> >> then we'll deadlock.
> >>
> >> It seems we need to have the locking conditional, e.g. only lock
> >> if tmp != pdev
> > Which will address the live-lock, but introduce ABBA deadlock 
> > potential
> > between the two locks.
>  I am not sure I can suggest a better solution here
>  @Roger, @Jan, could you please help here?
> >>> Well, first of all I'd like to mention that while it may have been 
> >>> okay to
> >>> not hold pcidevs_lock here for Dom0, it surely needs acquiring when 
> >>> dealing
> >>> with DomU-s' lists of PCI devices. The requirement really applies to 
> >>> the
> >>> other use of for_each_pdev() as well (in vpci_dump_msi()), except that
> >>> there it probably wants to be a try-lock.
> >>>
> >>> Next I'd like to point out that here we have the still pending issue 
> >>> of
> >>> how to deal with hidden devices, which Dom0 can access. See my RFC 
> >>> patch
> >>> "vPCI: account for hidden devices in modify_bars()". Whatever the 
> >>> solution
> >>> here, I think it wants to at least account for the extra need there.
> >> Yes, sorry, I should take care of that.
> >>
> >>> Now it is quite clear that pcidevs_lock isn't going to help with 
> >>> avoiding
> >>> the deadlock, as it's imo not an option at all to acquire that lock
> >>> 

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-04 Thread Oleksandr Andrushchenko


On 04.02.22 15:06, Roger Pau Monné wrote:
> On Fri, Feb 04, 2022 at 12:53:20PM +, Oleksandr Andrushchenko wrote:
>>
>> On 04.02.22 14:47, Jan Beulich wrote:
>>> On 04.02.2022 13:37, Oleksandr Andrushchenko wrote:
 On 04.02.22 13:37, Jan Beulich wrote:
> On 04.02.2022 12:13, Roger Pau Monné wrote:
>> On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote:
>>> On 04.02.2022 11:12, Oleksandr Andrushchenko wrote:
 On 04.02.22 11:15, Jan Beulich wrote:
> On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
>> On 04.02.22 09:52, Jan Beulich wrote:
>>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
 @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev 
 *pdev, uint16_t cmd, bool rom_only)
   continue;
   }
   
 +spin_lock(>vpci_lock);
 +if ( !tmp->vpci )
 +{
 +spin_unlock(>vpci_lock);
 +continue;
 +}
   for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); 
 i++ )
   {
   const struct vpci_bar *bar = 
 >vpci->header.bars[i];
 @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev 
 *pdev, uint16_t cmd, bool rom_only)
   rc = rangeset_remove_range(mem, start, end);
   if ( rc )
   {
 +spin_unlock(>vpci_lock);
   printk(XENLOG_G_WARNING "Failed to remove 
 [%lx, %lx]: %d\n",
  start, end, rc);
   rangeset_destroy(mem);
   return rc;
   }
   }
 +spin_unlock(>vpci_lock);
   }
>>> At the first glance this simply looks like another unjustified (in 
>>> the
>>> description) change, as you're not converting anything here but you
>>> actually add locking (and I realize this was there before, so I'm 
>>> sorry
>>> for not pointing this out earlier).
>> Well, I thought that the description already has "...the lock can be
>> used (and in a few cases is used right away) to check whether vpci
>> is present" and this is enough for such uses as here.
>>>   But then I wonder whether you
>>> actually tested this, since I can't help getting the impression that
>>> you're introducing a live-lock: The function is called from 
>>> cmd_write()
>>> and rom_write(), which in turn are called out of vpci_write(). Yet 
>>> that
>>> function already holds the lock, and the lock is not (currently)
>>> recursive. (For the 3rd caller of the function - init_bars() - otoh
>>> the locking looks to be entirely unnecessary.)
>> Well, you are correct: if tmp != pdev then it is correct to acquire
>> the lock. But if tmp == pdev and rom_only == true
>> then we'll deadlock.
>>
>> It seems we need to have the locking conditional, e.g. only lock
>> if tmp != pdev
> Which will address the live-lock, but introduce ABBA deadlock 
> potential
> between the two locks.
 I am not sure I can suggest a better solution here
 @Roger, @Jan, could you please help here?
>>> Well, first of all I'd like to mention that while it may have been okay 
>>> to
>>> not hold pcidevs_lock here for Dom0, it surely needs acquiring when 
>>> dealing
>>> with DomU-s' lists of PCI devices. The requirement really applies to the
>>> other use of for_each_pdev() as well (in vpci_dump_msi()), except that
>>> there it probably wants to be a try-lock.
>>>
>>> Next I'd like to point out that here we have the still pending issue of
>>> how to deal with hidden devices, which Dom0 can access. See my RFC patch
>>> "vPCI: account for hidden devices in modify_bars()". Whatever the 
>>> solution
>>> here, I think it wants to at least account for the extra need there.
>> Yes, sorry, I should take care of that.
>>
>>> Now it is quite clear that pcidevs_lock isn't going to help with 
>>> avoiding
>>> the deadlock, as it's imo not an option at all to acquire that lock
>>> everywhere else you access ->vpci (or else the vpci lock itself would be
>>> pointless). But a per-domain auxiliary r/w lock may help: Other paths
>>> would acquire it in read mode, and here you'd acquire it in write mode 
>>> (in
>>> the former case around the vpci lock, while in the latter case there may
>>> then not be any need to 

Re: [PATCH v6 10/13] vpci/header: reset the command register when adding devices

2022-02-04 Thread Oleksandr Andrushchenko


On 04.02.22 16:30, Jan Beulich wrote:
> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
>> Reset the command register when assigning a PCI device to a guest:
>> according to the PCI spec the PCI_COMMAND register is typically all 0's
>> after reset.
> It's not entirely clear to me whether setting the hardware register to
> zero is okay. What wants to be zero is the value the guest observes
> initially.
"the PCI spec says the PCI_COMMAND register is typically all 0's after reset."
Why wouldn't it be ok? What is the exact concern here?
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -454,8 +454,7 @@ static void cmd_write(const struct pci_dev *pdev, 
>> unsigned int reg,
>>   pci_conf_write16(pdev->sbdf, reg, cmd);
>>   }
>>   
>> -static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
>> -uint32_t cmd, void *data)
>> +static uint32_t emulate_cmd_reg(const struct pci_dev *pdev, uint32_t cmd)
> The command register is a 16-bit one, so parameter and return type should
> either be plain unsigned int (preferred, see ./CODING_STYLE) or uint16_t
> imo.
God catch, thank you
> Jan
>
Thank you,
Oleksandr

Re: [PATCH v6 10/13] vpci/header: reset the command register when adding devices

2022-02-04 Thread Jan Beulich
On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
> Reset the command register when assigning a PCI device to a guest:
> according to the PCI spec the PCI_COMMAND register is typically all 0's
> after reset.

It's not entirely clear to me whether setting the hardware register to
zero is okay. What wants to be zero is the value the guest observes
initially.

> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -454,8 +454,7 @@ static void cmd_write(const struct pci_dev *pdev, 
> unsigned int reg,
>  pci_conf_write16(pdev->sbdf, reg, cmd);
>  }
>  
> -static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
> -uint32_t cmd, void *data)
> +static uint32_t emulate_cmd_reg(const struct pci_dev *pdev, uint32_t cmd)

The command register is a 16-bit one, so parameter and return type should
either be plain unsigned int (preferred, see ./CODING_STYLE) or uint16_t
imo.

Jan




Re: [PATCH v6 09/13] vpci/header: emulate PCI_COMMAND register for guests

2022-02-04 Thread Jan Beulich
On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -454,6 +454,22 @@ static void cmd_write(const struct pci_dev *pdev, 
> unsigned int reg,
>  pci_conf_write16(pdev->sbdf, reg, cmd);
>  }
>  
> +static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
> +uint32_t cmd, void *data)
> +{
> +/* TODO: Add proper emulation for all bits of the command register. */
> +
> +#ifdef CONFIG_HAS_PCI_MSI
> +if ( pdev->vpci->msi->enabled || pdev->vpci->msix->enabled )
> +{
> +/* Guest wants to enable INTx. It can't be enabled if MSI/MSI-X 
> enabled. */
> +cmd |= PCI_COMMAND_INTX_DISABLE;
> +}
> +#endif
> +
> +cmd_write(pdev, reg, cmd, data);
> +}

It's not really clear to me whether the TODO warrants this being a
separate function. Personally I'd find it preferable if the logic
was folded into cmd_write().

With this and ...

> --- a/xen/drivers/vpci/msi.c
> +++ b/xen/drivers/vpci/msi.c
> @@ -70,6 +70,10 @@ static void control_write(const struct pci_dev *pdev, 
> unsigned int reg,
>  
>  if ( vpci_msi_arch_enable(msi, pdev, vectors) )
>  return;
> +
> +/* Make sure guest doesn't enable INTx while enabling MSI. */
> +if ( !is_hardware_domain(pdev->domain) )
> +pci_intx(pdev, false);
>  }
>  else
>  vpci_msi_arch_disable(msi, pdev);
> --- a/xen/drivers/vpci/msix.c
> +++ b/xen/drivers/vpci/msix.c
> @@ -92,6 +92,10 @@ static void control_write(const struct pci_dev *pdev, 
> unsigned int reg,
>  for ( i = 0; i < msix->max_entries; i++ )
>  if ( !msix->entries[i].masked && msix->entries[i].updated )
>  update_entry(>entries[i], pdev, i);
> +
> +/* Make sure guest doesn't enable INTx while enabling MSI-X. */
> +if ( !is_hardware_domain(pdev->domain) )
> +pci_intx(pdev, false);
>  }
>  else if ( !new_enabled && msix->enabled )
>  {

... this done (as requested) behind the back of the guest, what's the
idea wrt the guest reading the command register? That continues to be
wired to vpci_hw_read16() (and hence accesses the underlying hardware
value irrespective of what patch 4 did).

Jan




Re: [PATCH v6 04/13] vpci: restrict unhandled read/write operations for guests

2022-02-04 Thread Oleksandr Andrushchenko


On 04.02.22 16:11, Jan Beulich wrote:
> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
>> A guest can read and write those registers which are not emulated and
>> have no respective vPCI handlers, so it can access the HW directly.
> I don't think this describes the present situation. Or did I miss where
> devices can actually be exposed to guests already, despite much of the
> support logic still missing?
No, they are not exposed yet and you know that.
I will update the commit message
>
>> In order to prevent a guest from reads and writes from/to the unhandled
>> registers make sure only hardware domain can access HW directly and restrict
>> guests from doing so.
> Tangential question: Going over the titles of the remaining patches I
> notice patch 6 is going to deal with BAR accesses. But (going just
> from the titles) I can't spot anywhere that vendor and device IDs
> would be exposed to guests. Yet that's the first thing guests will need
> in order to actually recognize devices. As said before, allowing guests
> access to such r/o fields is quite likely going to be fine.
Agree, I was thinking about adding such a patch to allow IDs,
but finally decided not to add more to this series.
Again, the whole thing is not working yet and for the development
this patch can/needs to be reverted. So, either we implement IDs
or not this doesn't change anything with this respect
>
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -215,11 +215,15 @@ int vpci_remove_register(struct vpci *vpci, unsigned 
>> int offset,
>>   }
>>   
>>   /* Wrappers for performing reads/writes to the underlying hardware. */
>> -static uint32_t vpci_read_hw(pci_sbdf_t sbdf, unsigned int reg,
>> +static uint32_t vpci_read_hw(bool is_hwdom, pci_sbdf_t sbdf, unsigned int 
>> reg,
>>unsigned int size)
> Was the passing around of a boolean the consensus which was reached?
Was this patch committed yet?
> Personally I'd fine it more natural if the two functions checked
> current->domain themselves.
This is also possible, but I would like to hear Roger's view on this as well
I am fine either way
>
> Jan
>
Thank you,
Oleksandr

Re: [PATCH] tools/guest: Fix comment regarding CPUID compatibility

2022-02-04 Thread Roger Pau Monné
On Fri, Feb 04, 2022 at 02:10:03PM +, Andrew Cooper wrote:
> On 04/02/2022 13:46, Jan Beulich wrote:
> > On 04.02.2022 14:34, Andrew Cooper wrote:
> >> On 04/02/2022 13:09, Jan Beulich wrote:
> >>> On 04.02.2022 13:12, Andrew Cooper wrote:
>  On 04/02/2022 08:31, Jan Beulich wrote:
> > On 03.02.2022 19:10, Andrew Cooper wrote:
> >> It was Xen 4.14 where CPUID data was added to the migration stream, 
> >> and 4.13
> >> that we need to worry about with regards to compatibility.  Xen 4.12 
> >> isn't
> >> relevant.
> >>
> >> Expand and correct the commentary.
> >>
> >> Fixes: 111c8c33a8a1 ("x86/cpuid: do not expand max leaves on restore")
> > But doesn't this commit amend 685e922d6f30 ("tools/libxc: Rework
> > xc_cpuid_apply_policy() to use {get,set}_cpu_policy()"), which is
> > where DEF_MAX_* disappeared?
>  No. All that happened in that change was that we switched to using
> 
>  cpuid.h:89:#define CPUID_GUEST_NR_EXTD_AMD
> 
>  instead, which remained the same size until Xen 4.15 when e9b4fe26364
>  bumped it.
> >>> Oh, right. I did try to look for a replacement, but managed to miss
> >>> this. But then, as much as 4.12 isn't relevant, isn't it the case
> >>> that the fact that CPUID data was added to the stream in 4.14 isn't
> >>> relevant here either, and it's instead the bumping in 4.15 which is?
> >> The fact that the bump happened is relevant, by virtue of the fact there
> >> logic added to cope.  The fact it was in 4.15 is not relevant - this
> >> isn't a list of every ABI-relevant change.
> >>
> >> CPUID data being added to the stream is critically important, because
> >> that's the point after which we never enter this compatibility path.
> > If the bump happened before CPUID data was added to the stream, logic to
> > cope with migrating-in guests would have been required too, wouldn't it.
> 
> Yes, it would have been.
> 
> It wasn't an accident that none of the max leaves changed while doing
> the Xen CPUID work.
> 
> We're unfortunately a long way behind on Intel CPUID leaves, but all(?)
> of the new leaves need more complicated migration safely logic than the
> toolstack currently knows how to do.
> 
> > But anyway, just to be done with this:
> > Acked-by: Jan Beulich 
> 
> Thanks.

Will rebase my CPUID series on top of this, but I will wait for
further comments before sending a new version.

Roger.



Re: x86: insn-eval.c's use of native_store_gdt()

2022-02-04 Thread Jan Beulich
On 04.02.2022 15:08, Thomas Gleixner wrote:
> On Fri, Feb 04 2022 at 10:23, Jan Beulich wrote:
>> On 30.11.2021 12:09, Jan Beulich wrote:
>>> I think it is b968e84b509d ("x86/iopl: Fake iopl(3) CLI/STI usage")
>>> which uncovered an issue with get_desc() trying to access the GDT, as
>>> introduced by 670f928ba09b ("x86/insn-eval: Add utility function to
>>> get segment descriptor"). When running in a PV domain under Xen, the
>>> (hypervisor's) GDT isn't accessible; with UMIP enabled by Xen even
>>> SGDT wouldn't work, as the kernel runs in ring 3.
>>>
>>> There's currently no hypercall to retrieve a descriptor from the GDT.
>>> It is instead assumed that kernels know where their present GDT
>>> lives. Can the native_store_gdt() be replaced there, please?
>>>
>>> For context (I don't think it should matter much here) I'm observing
>>> this with the kernel put underneath a rather old distro, where
>>> hwclock triggers this path.
>>
>> I'd like to note that the issue still exists in 5.16.
> 
> I'd like to note, that I've seen no patches to that effect.

I could have worded it that way as well, yes.

Jan




Re: [PATCH v6 04/13] vpci: restrict unhandled read/write operations for guests

2022-02-04 Thread Jan Beulich
On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
> A guest can read and write those registers which are not emulated and
> have no respective vPCI handlers, so it can access the HW directly.

I don't think this describes the present situation. Or did I miss where
devices can actually be exposed to guests already, despite much of the
support logic still missing?

> In order to prevent a guest from reads and writes from/to the unhandled
> registers make sure only hardware domain can access HW directly and restrict
> guests from doing so.

Tangential question: Going over the titles of the remaining patches I
notice patch 6 is going to deal with BAR accesses. But (going just
from the titles) I can't spot anywhere that vendor and device IDs
would be exposed to guests. Yet that's the first thing guests will need
in order to actually recognize devices. As said before, allowing guests
access to such r/o fields is quite likely going to be fine.

> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -215,11 +215,15 @@ int vpci_remove_register(struct vpci *vpci, unsigned 
> int offset,
>  }
>  
>  /* Wrappers for performing reads/writes to the underlying hardware. */
> -static uint32_t vpci_read_hw(pci_sbdf_t sbdf, unsigned int reg,
> +static uint32_t vpci_read_hw(bool is_hwdom, pci_sbdf_t sbdf, unsigned int 
> reg,
>   unsigned int size)

Was the passing around of a boolean the consensus which was reached?
Personally I'd fine it more natural if the two functions checked
current->domain themselves.

Jan




[seabios test] 168003: tolerable FAIL - PUSHED

2022-02-04 Thread osstest service owner
flight 168003 seabios real [real]
flight 168009 seabios real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/168003/
http://logs.test-lab.xenproject.org/osstest/logs/168009/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 12 debian-hvm-install fail pass in 
168009-retest

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 167920
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 167920
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 167920
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 167920
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 167920
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass

version targeted for testing:
 seabios  829b0f1a7cda1bccdf44a379fb3a96e519a7e8cd
baseline version:
 seabios  dc776a2d9ca9e1b857e880ff682668871369b4c3

Last test of basis   167920  2022-01-27 16:42:48 Z7 days
Testing same since   168003  2022-02-03 23:10:24 Z0 days1 attempts


People who touched revisions under test:
  Florian Larysch 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm pass
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm  fail
 test-amd64-amd64-qemuu-nested-amdfail
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-amd64-qemuu-freebsd11-amd64   pass
 test-amd64-amd64-qemuu-freebsd12-amd64   pass
 test-amd64-amd64-xl-qemuu-win7-amd64 fail
 test-amd64-i386-xl-qemuu-win7-amd64  fail
 test-amd64-amd64-xl-qemuu-ws16-amd64 fail
 test-amd64-i386-xl-qemuu-ws16-amd64  fail
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrictpass
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict pass
 test-amd64-amd64-qemuu-nested-intel  pass
 test-amd64-i386-qemuu-rhel6hvm-intel pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/seabios.git
   dc776a2..829b0f1  829b0f1a7cda1bccdf44a379fb3a96e519a7e8cd -> 
xen-tested-master



Re: [PATCH] tools/guest: Fix comment regarding CPUID compatibility

2022-02-04 Thread Andrew Cooper
On 04/02/2022 13:46, Jan Beulich wrote:
> On 04.02.2022 14:34, Andrew Cooper wrote:
>> On 04/02/2022 13:09, Jan Beulich wrote:
>>> On 04.02.2022 13:12, Andrew Cooper wrote:
 On 04/02/2022 08:31, Jan Beulich wrote:
> On 03.02.2022 19:10, Andrew Cooper wrote:
>> It was Xen 4.14 where CPUID data was added to the migration stream, and 
>> 4.13
>> that we need to worry about with regards to compatibility.  Xen 4.12 
>> isn't
>> relevant.
>>
>> Expand and correct the commentary.
>>
>> Fixes: 111c8c33a8a1 ("x86/cpuid: do not expand max leaves on restore")
> But doesn't this commit amend 685e922d6f30 ("tools/libxc: Rework
> xc_cpuid_apply_policy() to use {get,set}_cpu_policy()"), which is
> where DEF_MAX_* disappeared?
 No. All that happened in that change was that we switched to using

 cpuid.h:89:#define CPUID_GUEST_NR_EXTD_AMD

 instead, which remained the same size until Xen 4.15 when e9b4fe26364
 bumped it.
>>> Oh, right. I did try to look for a replacement, but managed to miss
>>> this. But then, as much as 4.12 isn't relevant, isn't it the case
>>> that the fact that CPUID data was added to the stream in 4.14 isn't
>>> relevant here either, and it's instead the bumping in 4.15 which is?
>> The fact that the bump happened is relevant, by virtue of the fact there
>> logic added to cope.  The fact it was in 4.15 is not relevant - this
>> isn't a list of every ABI-relevant change.
>>
>> CPUID data being added to the stream is critically important, because
>> that's the point after which we never enter this compatibility path.
> If the bump happened before CPUID data was added to the stream, logic to
> cope with migrating-in guests would have been required too, wouldn't it.

Yes, it would have been.

It wasn't an accident that none of the max leaves changed while doing
the Xen CPUID work.

We're unfortunately a long way behind on Intel CPUID leaves, but all(?)
of the new leaves need more complicated migration safely logic than the
toolstack currently knows how to do.

> But anyway, just to be done with this:
> Acked-by: Jan Beulich 

Thanks.

~Andrew



Re: x86: insn-eval.c's use of native_store_gdt()

2022-02-04 Thread Thomas Gleixner
On Fri, Feb 04 2022 at 10:23, Jan Beulich wrote:
> On 30.11.2021 12:09, Jan Beulich wrote:
>> I think it is b968e84b509d ("x86/iopl: Fake iopl(3) CLI/STI usage")
>> which uncovered an issue with get_desc() trying to access the GDT, as
>> introduced by 670f928ba09b ("x86/insn-eval: Add utility function to
>> get segment descriptor"). When running in a PV domain under Xen, the
>> (hypervisor's) GDT isn't accessible; with UMIP enabled by Xen even
>> SGDT wouldn't work, as the kernel runs in ring 3.
>> 
>> There's currently no hypercall to retrieve a descriptor from the GDT.
>> It is instead assumed that kernels know where their present GDT
>> lives. Can the native_store_gdt() be replaced there, please?
>> 
>> For context (I don't think it should matter much here) I'm observing
>> this with the kernel put underneath a rather old distro, where
>> hwclock triggers this path.
>
> I'd like to note that the issue still exists in 5.16.

I'd like to note, that I've seen no patches to that effect.

Thanks,

tglx



Re: Xen data from meta-virtualization layer

2022-02-04 Thread Michael Walle
Hi all,

> + Zhiqiang Hou
> 
> On Tue, 24 Nov 2020, Leo Krueger wrote:
> > > >>> On Tue, 17 Nov 2020, Leo Krueger wrote:
> > >  Hi,
> > > 
> > >  I enabled CONFIG_HAS_ITS (what a stupid mistake by me to not set it
> > >  before...) but then had to add the following node to my device tree
> > > 
> > >   gic_lpi_base: syscon@0x8000 {
> > >   compatible = "gic-lpi-base";
> > > >>
> > > >> I couldn't find this compatible defined/used in Linux 5.10-rc4. @Leo,
> > > >> could you clarify which flavor/version of Linux you are using?
> > > >
> > > > It is Linux 4.19 from Yocto (Warror release). XEN 4.13.2.
> > > 
> > > Do you have a link to the Linux tree? Is there any additional patches on 
> > > top of
> > > vanilla?
> > 
> > Linux tree is found here: 
> > https://github.com/kontron/linux-smarc-sal28/commits/master-LSDK-19.09
> > (up to the latest commit in that branch)

FWIW, I'm the developer of the support for this board both in the
mentioned branch as well as upstream.

If you don't need graphics you shouldn't really be using the branch
above, but the latest vanilla kernel release.

> [...]
> 
> > > Looking at the DT changes in [0], it looks like the node is not a child 
> > > of gic@.
> > > So I think Xen will map the region to Dom0.
> > > 
> > > There are two things that I can notice:
> > >1) This region is RAM, but I can't find any reserve node. Is there any 
> > > specific
> > > code in Linux to reserve it?
> > >2) The implementation in U-boot seems to suggest that the firmware will
> > > configure the LPIs and then enable it. If that's the case, then Xen needs 
> > > to
> > > re-use the table in the DT rather than allocating a new one.
> 
> That Linux tree has no mentions of gic-lpi-base. That means that
> gic-lpi-base is only used in u-boot, not in Linux. In particular the
> most relevant commit is af288cb291da3abef6be0875527729296f7de7a0. 

That node was horrible hack from NXP for u-boot and was removed in
a84cea06bb8fff69810a890ac0e4b47ea5726512.

> In regards to the reserved-memory regions, maybe we are not seeing them
> because Leo posted the host device tree, not the one passed at runtime
> from u-boot to Linux?
> 
> If so, Leo, could you please boot Linux on native (no Xen) and get the
> device tree from there at runtime using dtc -I fs -O dts
> /proc/device-tree ?
> 
> 
> However, the name of the reserved-memory region created by u-boot seems
> to be "lpi_rd_table". I cannot find any mentions of lpi_rd_table in the
> Linux kernel tree either.
> 
> Zhiqiang, Leo is trying to boot Xen on sAL28. Linux booting on Xen
> throws errors in regards to GIC/ITS initialization. On other hardware
> Xen can use and virtualize GICv3 and ITS just fine. Could you please
> explain what is different about sAL28 and how Xen/Linux is expected to
> use the lpi_rd_table reserved-memory region?

I actually stumbled across this thread after trying out Xen myself. I'm
using lastest vanilla u-boot (with pending PSCI patches), vanilla kernel
and vanilla Xen.

So far I've discovered, that xen complains that it cannot route IRQ64 to
dom0. That is because on the LS1028A there is a dual UART (two 16550
with one shared interrupt) and xen takes the first UART and then tries
to map the interrupt of the second UART to linux. For now, I don't know
how this is solved correctly. As a quick hack, I removed the second uart
node from the device tree.

But what is more severe is that the iommu isn't set up correctly. I'm
getting the following faults:

(XEN) smmu: /soc/iommu@500: Unexpected global fault, this could be serious
(XEN) smmu: /soc/iommu@500: GFSR 0x8002, GFSYNR0 0x, 
GFSYNR1 0x042a, GFSYNR2 0x

If I decode it correctly, the streamid should be 0x2a which would be one
of the PCI devices on the internal root complex. Probably the network
card.

This is the first developer experience with Xen, so please bear with me
:) It seems that Xen doesn't add the master to the IOMMU. To me it seems
that only devices with a 'iommus' dt property are added. But in case of
PCI devices the parent only has a iommu-map property.

And it makes me wonder why Leo has an almost working setup. Maybe I'm
missing some patches though.

-michael



[PATCH] x86/Intel: don't log bogus frequency range on Core/Core2 processors

2022-02-04 Thread Jan Beulich
Models 0F and 17 don't have PLATFORM_INFO documented. While it exists on
at least model 0F, the information there doesn't match the scheme used
on newer models (I'm observing a range of 700 ... 600 MHz reported on a
Xeon E5345).

Sadly the Enhanced Intel Core instance of the table entry is not self-
consistent: The numeric description of the low 3 bits doesn't match the
subsequent more textual description in some of the cases; I'm using the
former here.

Include the older Core model 0E as well as the two other Core2 models,
none of which have respective MSR tables in the SDM.

Fixes: f6b6517cd5db ("x86: retrieve and log CPU frequency information")
Signed-off-by: Jan Beulich 
---
While the SDM table for the two models lists FSB_FREQ, I'm afraid its
information is of little use here: If anything it could serve as a
reference for the frequency determined by calibrate_APIC_clock().
---
RFC: I may want to rebase over Roger's addition of intel-family.h, but
 first of all I wanted to see whether going this route is deemed
 acceptable at all.

--- a/xen/arch/x86/cpu/intel.c
+++ b/xen/arch/x86/cpu/intel.c
@@ -435,6 +435,26 @@ static void intel_log_freq(const struct
 if ( c->x86 == 6 )
 switch ( c->x86_model )
 {
+static const unsigned short core_factors[] =
+{ 26667, 1, 2, 16667, 3, 1, 4 };
+
+case 0x0e: /* Core */
+case 0x0f: case 0x16: case 0x17: case 0x1d: /* Core2 */
+/*
+ * PLATFORM_INFO, while not documented for these, appears to
+ * exist in at least some cases, but what it holds doesn't
+ * match the scheme used by newer CPUs.  At a guess, the min
+ * and max fields look to be reversed, while the scaling
+ * factor is encoded in FSB_FREQ.
+ */
+if ( min_ratio > max_ratio )
+SWAP(min_ratio, max_ratio);
+if ( rdmsr_safe(MSR_FSB_FREQ, msrval) ||
+ (msrval &= 7) >= ARRAY_SIZE(core_factors) )
+return;
+factor = core_factors[msrval];
+break;
+
 case 0x1a: case 0x1e: case 0x1f: case 0x2e: /* Nehalem */
 case 0x25: case 0x2c: case 0x2f: /* Westmere */
 factor = 1;




Re: [PATCH v2] docs: document patch rules

2022-02-04 Thread Jan Beulich
On 03.02.2022 13:54, Juergen Gross wrote:
> Add a document to describe the rules for sending a proper patch.
> 
> As it contains all the information already being present in
> docs/process/tags.pandoc remove that file.
> 
> The "Reviewed-by:" and "Acked-by:" tags are expanded to allow an
> optional restriction of the tag.
> 
> A new tag "Origin:" is added to tag patches taken from another project.
> 
> Signed-off-by: Juergen Gross 

Acked-by: Jan Beulich 




Re: [PATCH] tools/guest: Fix comment regarding CPUID compatibility

2022-02-04 Thread Jan Beulich
On 04.02.2022 14:34, Andrew Cooper wrote:
> On 04/02/2022 13:09, Jan Beulich wrote:
>> On 04.02.2022 13:12, Andrew Cooper wrote:
>>> On 04/02/2022 08:31, Jan Beulich wrote:
 On 03.02.2022 19:10, Andrew Cooper wrote:
> It was Xen 4.14 where CPUID data was added to the migration stream, and 
> 4.13
> that we need to worry about with regards to compatibility.  Xen 4.12 isn't
> relevant.
>
> Expand and correct the commentary.
>
> Fixes: 111c8c33a8a1 ("x86/cpuid: do not expand max leaves on restore")
 But doesn't this commit amend 685e922d6f30 ("tools/libxc: Rework
 xc_cpuid_apply_policy() to use {get,set}_cpu_policy()"), which is
 where DEF_MAX_* disappeared?
>>> No. All that happened in that change was that we switched to using
>>>
>>> cpuid.h:89:#define CPUID_GUEST_NR_EXTD_AMD
>>>
>>> instead, which remained the same size until Xen 4.15 when e9b4fe26364
>>> bumped it.
>> Oh, right. I did try to look for a replacement, but managed to miss
>> this. But then, as much as 4.12 isn't relevant, isn't it the case
>> that the fact that CPUID data was added to the stream in 4.14 isn't
>> relevant here either, and it's instead the bumping in 4.15 which is?
> 
> The fact that the bump happened is relevant, by virtue of the fact there
> logic added to cope.  The fact it was in 4.15 is not relevant - this
> isn't a list of every ABI-relevant change.
> 
> CPUID data being added to the stream is critically important, because
> that's the point after which we never enter this compatibility path.

If the bump happened before CPUID data was added to the stream, logic to
cope with migrating-in guests would have been required too, wouldn't it.

But anyway, just to be done with this:
Acked-by: Jan Beulich 

Jan




Re: [PATCH] tools/guest: Fix comment regarding CPUID compatibility

2022-02-04 Thread Andrew Cooper
On 04/02/2022 13:09, Jan Beulich wrote:
> On 04.02.2022 13:12, Andrew Cooper wrote:
>> On 04/02/2022 08:31, Jan Beulich wrote:
>>> On 03.02.2022 19:10, Andrew Cooper wrote:
 It was Xen 4.14 where CPUID data was added to the migration stream, and 
 4.13
 that we need to worry about with regards to compatibility.  Xen 4.12 isn't
 relevant.

 Expand and correct the commentary.

 Fixes: 111c8c33a8a1 ("x86/cpuid: do not expand max leaves on restore")
>>> But doesn't this commit amend 685e922d6f30 ("tools/libxc: Rework
>>> xc_cpuid_apply_policy() to use {get,set}_cpu_policy()"), which is
>>> where DEF_MAX_* disappeared?
>> No. All that happened in that change was that we switched to using
>>
>> cpuid.h:89:#define CPUID_GUEST_NR_EXTD_AMD
>>
>> instead, which remained the same size until Xen 4.15 when e9b4fe26364
>> bumped it.
> Oh, right. I did try to look for a replacement, but managed to miss
> this. But then, as much as 4.12 isn't relevant, isn't it the case
> that the fact that CPUID data was added to the stream in 4.14 isn't
> relevant here either, and it's instead the bumping in 4.15 which is?

The fact that the bump happened is relevant, by virtue of the fact there
logic added to cope.  The fact it was in 4.15 is not relevant - this
isn't a list of every ABI-relevant change.

CPUID data being added to the stream is critically important, because
that's the point after which we never enter this compatibility path.

~Andrew


Re: [PATCH] tools/guest: Fix comment regarding CPUID compatibility

2022-02-04 Thread Jan Beulich
On 04.02.2022 13:12, Andrew Cooper wrote:
> On 04/02/2022 08:31, Jan Beulich wrote:
>> On 03.02.2022 19:10, Andrew Cooper wrote:
>>> It was Xen 4.14 where CPUID data was added to the migration stream, and 4.13
>>> that we need to worry about with regards to compatibility.  Xen 4.12 isn't
>>> relevant.
>>>
>>> Expand and correct the commentary.
>>>
>>> Fixes: 111c8c33a8a1 ("x86/cpuid: do not expand max leaves on restore")
>> But doesn't this commit amend 685e922d6f30 ("tools/libxc: Rework
>> xc_cpuid_apply_policy() to use {get,set}_cpu_policy()"), which is
>> where DEF_MAX_* disappeared?
> 
> No. All that happened in that change was that we switched to using
> 
> cpuid.h:89:#define CPUID_GUEST_NR_EXTD_AMD
> 
> instead, which remained the same size until Xen 4.15 when e9b4fe26364
> bumped it.

Oh, right. I did try to look for a replacement, but managed to miss
this. But then, as much as 4.12 isn't relevant, isn't it the case
that the fact that CPUID data was added to the stream in 4.14 isn't
relevant here either, and it's instead the bumping in 4.15 which is?
IOW while I've been fine with the comment adjustment anyway, there
would still want to be an adjustment to the description.

>> While looking at this, wasn't Roger's change incomplete, in that
>> for Intel the extended leaf upper bound was 0x8008 in 4.12?
> 
> CPUID_GUEST_NR_EXTD_INTEL is still 8, so this is all fine.

Again, somehow I did manage to miss the replacement defines.

Jan




Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-04 Thread Roger Pau Monné
On Fri, Feb 04, 2022 at 12:53:20PM +, Oleksandr Andrushchenko wrote:
> 
> 
> On 04.02.22 14:47, Jan Beulich wrote:
> > On 04.02.2022 13:37, Oleksandr Andrushchenko wrote:
> >>
> >> On 04.02.22 13:37, Jan Beulich wrote:
> >>> On 04.02.2022 12:13, Roger Pau Monné wrote:
>  On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote:
> > On 04.02.2022 11:12, Oleksandr Andrushchenko wrote:
> >> On 04.02.22 11:15, Jan Beulich wrote:
> >>> On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
>  On 04.02.22 09:52, Jan Beulich wrote:
> > On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
> >> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev 
> >> *pdev, uint16_t cmd, bool rom_only)
> >>  continue;
> >>  }
> >>  
> >> +spin_lock(>vpci_lock);
> >> +if ( !tmp->vpci )
> >> +{
> >> +spin_unlock(>vpci_lock);
> >> +continue;
> >> +}
> >>  for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); 
> >> i++ )
> >>  {
> >>  const struct vpci_bar *bar = 
> >> >vpci->header.bars[i];
> >> @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev 
> >> *pdev, uint16_t cmd, bool rom_only)
> >>  rc = rangeset_remove_range(mem, start, end);
> >>  if ( rc )
> >>  {
> >> +spin_unlock(>vpci_lock);
> >>  printk(XENLOG_G_WARNING "Failed to remove 
> >> [%lx, %lx]: %d\n",
> >> start, end, rc);
> >>  rangeset_destroy(mem);
> >>  return rc;
> >>  }
> >>  }
> >> +spin_unlock(>vpci_lock);
> >>  }
> > At the first glance this simply looks like another unjustified (in 
> > the
> > description) change, as you're not converting anything here but you
> > actually add locking (and I realize this was there before, so I'm 
> > sorry
> > for not pointing this out earlier).
>  Well, I thought that the description already has "...the lock can be
>  used (and in a few cases is used right away) to check whether vpci
>  is present" and this is enough for such uses as here.
> >  But then I wonder whether you
> > actually tested this, since I can't help getting the impression that
> > you're introducing a live-lock: The function is called from 
> > cmd_write()
> > and rom_write(), which in turn are called out of vpci_write(). Yet 
> > that
> > function already holds the lock, and the lock is not (currently)
> > recursive. (For the 3rd caller of the function - init_bars() - otoh
> > the locking looks to be entirely unnecessary.)
>  Well, you are correct: if tmp != pdev then it is correct to acquire
>  the lock. But if tmp == pdev and rom_only == true
>  then we'll deadlock.
> 
>  It seems we need to have the locking conditional, e.g. only lock
>  if tmp != pdev
> >>> Which will address the live-lock, but introduce ABBA deadlock 
> >>> potential
> >>> between the two locks.
> >> I am not sure I can suggest a better solution here
> >> @Roger, @Jan, could you please help here?
> > Well, first of all I'd like to mention that while it may have been okay 
> > to
> > not hold pcidevs_lock here for Dom0, it surely needs acquiring when 
> > dealing
> > with DomU-s' lists of PCI devices. The requirement really applies to the
> > other use of for_each_pdev() as well (in vpci_dump_msi()), except that
> > there it probably wants to be a try-lock.
> >
> > Next I'd like to point out that here we have the still pending issue of
> > how to deal with hidden devices, which Dom0 can access. See my RFC patch
> > "vPCI: account for hidden devices in modify_bars()". Whatever the 
> > solution
> > here, I think it wants to at least account for the extra need there.
>  Yes, sorry, I should take care of that.
> 
> > Now it is quite clear that pcidevs_lock isn't going to help with 
> > avoiding
> > the deadlock, as it's imo not an option at all to acquire that lock
> > everywhere else you access ->vpci (or else the vpci lock itself would be
> > pointless). But a per-domain auxiliary r/w lock may help: Other paths
> > would acquire it in read mode, and here you'd acquire it in write mode 
> > (in
> > the former case around the vpci lock, while in the latter case there may
> > then not be any need to acquire the individual vpci locks at all). 
> > FTAOD:
> 

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-04 Thread Jan Beulich
On 04.02.2022 13:53, Oleksandr Andrushchenko wrote:
> 
> 
> On 04.02.22 14:47, Jan Beulich wrote:
>> On 04.02.2022 13:37, Oleksandr Andrushchenko wrote:
>>>
>>> On 04.02.22 13:37, Jan Beulich wrote:
 On 04.02.2022 12:13, Roger Pau Monné wrote:
> On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote:
>> On 04.02.2022 11:12, Oleksandr Andrushchenko wrote:
>>> On 04.02.22 11:15, Jan Beulich wrote:
 On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
> On 04.02.22 09:52, Jan Beulich wrote:
>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
>>> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev 
>>> *pdev, uint16_t cmd, bool rom_only)
>>>  continue;
>>>  }
>>>  
>>> +spin_lock(>vpci_lock);
>>> +if ( !tmp->vpci )
>>> +{
>>> +spin_unlock(>vpci_lock);
>>> +continue;
>>> +}
>>>  for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); 
>>> i++ )
>>>  {
>>>  const struct vpci_bar *bar = 
>>> >vpci->header.bars[i];
>>> @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev 
>>> *pdev, uint16_t cmd, bool rom_only)
>>>  rc = rangeset_remove_range(mem, start, end);
>>>  if ( rc )
>>>  {
>>> +spin_unlock(>vpci_lock);
>>>  printk(XENLOG_G_WARNING "Failed to remove 
>>> [%lx, %lx]: %d\n",
>>> start, end, rc);
>>>  rangeset_destroy(mem);
>>>  return rc;
>>>  }
>>>  }
>>> +spin_unlock(>vpci_lock);
>>>  }
>> At the first glance this simply looks like another unjustified (in 
>> the
>> description) change, as you're not converting anything here but you
>> actually add locking (and I realize this was there before, so I'm 
>> sorry
>> for not pointing this out earlier).
> Well, I thought that the description already has "...the lock can be
> used (and in a few cases is used right away) to check whether vpci
> is present" and this is enough for such uses as here.
>>  But then I wonder whether you
>> actually tested this, since I can't help getting the impression that
>> you're introducing a live-lock: The function is called from 
>> cmd_write()
>> and rom_write(), which in turn are called out of vpci_write(). Yet 
>> that
>> function already holds the lock, and the lock is not (currently)
>> recursive. (For the 3rd caller of the function - init_bars() - otoh
>> the locking looks to be entirely unnecessary.)
> Well, you are correct: if tmp != pdev then it is correct to acquire
> the lock. But if tmp == pdev and rom_only == true
> then we'll deadlock.
>
> It seems we need to have the locking conditional, e.g. only lock
> if tmp != pdev
 Which will address the live-lock, but introduce ABBA deadlock potential
 between the two locks.
>>> I am not sure I can suggest a better solution here
>>> @Roger, @Jan, could you please help here?
>> Well, first of all I'd like to mention that while it may have been okay 
>> to
>> not hold pcidevs_lock here for Dom0, it surely needs acquiring when 
>> dealing
>> with DomU-s' lists of PCI devices. The requirement really applies to the
>> other use of for_each_pdev() as well (in vpci_dump_msi()), except that
>> there it probably wants to be a try-lock.
>>
>> Next I'd like to point out that here we have the still pending issue of
>> how to deal with hidden devices, which Dom0 can access. See my RFC patch
>> "vPCI: account for hidden devices in modify_bars()". Whatever the 
>> solution
>> here, I think it wants to at least account for the extra need there.
> Yes, sorry, I should take care of that.
>
>> Now it is quite clear that pcidevs_lock isn't going to help with avoiding
>> the deadlock, as it's imo not an option at all to acquire that lock
>> everywhere else you access ->vpci (or else the vpci lock itself would be
>> pointless). But a per-domain auxiliary r/w lock may help: Other paths
>> would acquire it in read mode, and here you'd acquire it in write mode 
>> (in
>> the former case around the vpci lock, while in the latter case there may
>> then not be any need to acquire the individual vpci locks at all). FTAOD:
>> I haven't fully thought through all implications (and hence whether this 
>> is
>> viable in the first place); I expect you 

Re: [PATCH] xen: Modify domain_crash() to take a print string

2022-02-04 Thread Jan Beulich
On 04.02.2022 12:56, Andrew Cooper wrote:
> On 03/02/2022 15:06, Jan Beulich wrote:
>> On 03.02.2022 15:41, Andrew Cooper wrote:
>>> On 03/02/2022 14:19, Jan Beulich wrote:
 On 03.02.2022 15:11, Andrew Cooper wrote:
> On 03/02/2022 13:48, Julien Grall wrote:
>> On 03/02/2022 13:38, Andrew Cooper wrote:
>>> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
>>> index 37f78cc4c4c9..38b390d20371 100644
>>> --- a/xen/include/xen/sched.h
>>> +++ b/xen/include/xen/sched.h
>>> @@ -736,10 +736,15 @@ void vcpu_end_shutdown_deferral(struct vcpu *v);
>>>    * from any processor.
>>>    */
>>>   void __domain_crash(struct domain *d);
>>> -#define domain_crash(d) do
>>> {  \
>>> -    printk("domain_crash called from %s:%d\n", __FILE__,
>>> __LINE__);   \
>>> -   
>>> __domain_crash(d);    \
>>> -} while (0)
>>> +#define domain_crash(d, ...)    \
>>> +    do {    \
>>> +    if ( count_args(__VA_ARGS__) == 0 ) \
>>> +    printk("domain_crash called from %s:%d\n",  \
>>> +   __FILE__, __LINE__); \
>> I find a bit odd that here you are using a normal printk
> That's unmodified from before.  Only reformatted.
>
>> but...
>>
>>
>>> +    else    \
>>> +    printk(XENLOG_G_ERR __VA_ARGS__);   \
>> here it is XENLOG_G_ERR. In fact, isn't it ratelimited? If so,
>> wouldn't it be better to only use XENLOG_ERR so they can always be
>> seen? (A domain shouldn't be able to abuse it).
> Perhaps.  I suppose it is more important information than pretty much
> anything else about the guest.
 Indeed, but then - is this really an error in all cases?
>>> Yes.  It is always a fatal event for the VM.
>> Which may or may not be Xen's fault. If the guest put itself in a bad
>> state, I don't see why we shouldn't consider such just a warning.
> 
> Log level is the severity of the action, not who's potentially to blame
> for causing the situation.
> 
> Furthermore, almost all callers who do emit appropriate diagnostics
> before domain_crash() already use ERR.

Well, yes, this looks to indeed be the case (albeit frequently with
gdprintk(), in which case I'm inclined to say the log level isn't
very relevant in the first place). On this basis I'm willing to give
in, despite continuing to not being fully convinced.

Jan




Re: [PATCH] xen: Modify domain_crash() to take a print string

2022-02-04 Thread Jan Beulich
On 03.02.2022 14:38, Andrew Cooper wrote:
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -1693,11 +1693,8 @@ static void load_segments(struct vcpu *n)
>   put_guest(uregs->fs,   esp - 5) |
>   put_guest(uregs->es,   esp - 6) |
>   put_guest(uregs->ds,   esp - 7) )
> -{
> -gprintk(XENLOG_ERR,
> -"error while creating compat failsafe callback 
> frame\n");
> -domain_crash(n->domain);
> -}
> +domain_crash(n->domain,
> + "Error creating compat failsafe callback 
> frame\n");
>  
>  if ( n->arch.pv.vgc_flags & VGCF_failsafe_disables_events )
>  vcpu_info(n, evtchn_upcall_mask) = 1;
> @@ -1732,11 +1729,8 @@ static void load_segments(struct vcpu *n)
>   put_guest(uregs->ds,   rsp -  9) |
>   put_guest(regs->r11,   rsp - 10) |
>   put_guest(regs->rcx,   rsp - 11) )
> -{
> -gprintk(XENLOG_ERR,
> -"error while creating failsafe callback frame\n");
> -domain_crash(n->domain);
> -}
> +domain_crash(n->domain,
> + "Error creating failsafe callback frame\n");

I assume it wasn't really intended to hide potentially relevant information
(the subject vCPU) by this change, which - by way of gprintk() - did get
logged before (since we already have n == current at this point)?

Jan




Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-04 Thread Oleksandr Andrushchenko


On 04.02.22 14:47, Jan Beulich wrote:
> On 04.02.2022 13:37, Oleksandr Andrushchenko wrote:
>>
>> On 04.02.22 13:37, Jan Beulich wrote:
>>> On 04.02.2022 12:13, Roger Pau Monné wrote:
 On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote:
> On 04.02.2022 11:12, Oleksandr Andrushchenko wrote:
>> On 04.02.22 11:15, Jan Beulich wrote:
>>> On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
 On 04.02.22 09:52, Jan Beulich wrote:
> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
>> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev 
>> *pdev, uint16_t cmd, bool rom_only)
>>  continue;
>>  }
>>  
>> +spin_lock(>vpci_lock);
>> +if ( !tmp->vpci )
>> +{
>> +spin_unlock(>vpci_lock);
>> +continue;
>> +}
>>  for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); 
>> i++ )
>>  {
>>  const struct vpci_bar *bar = 
>> >vpci->header.bars[i];
>> @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev 
>> *pdev, uint16_t cmd, bool rom_only)
>>  rc = rangeset_remove_range(mem, start, end);
>>  if ( rc )
>>  {
>> +spin_unlock(>vpci_lock);
>>  printk(XENLOG_G_WARNING "Failed to remove [%lx, 
>> %lx]: %d\n",
>> start, end, rc);
>>  rangeset_destroy(mem);
>>  return rc;
>>  }
>>  }
>> +spin_unlock(>vpci_lock);
>>  }
> At the first glance this simply looks like another unjustified (in the
> description) change, as you're not converting anything here but you
> actually add locking (and I realize this was there before, so I'm 
> sorry
> for not pointing this out earlier).
 Well, I thought that the description already has "...the lock can be
 used (and in a few cases is used right away) to check whether vpci
 is present" and this is enough for such uses as here.
>  But then I wonder whether you
> actually tested this, since I can't help getting the impression that
> you're introducing a live-lock: The function is called from 
> cmd_write()
> and rom_write(), which in turn are called out of vpci_write(). Yet 
> that
> function already holds the lock, and the lock is not (currently)
> recursive. (For the 3rd caller of the function - init_bars() - otoh
> the locking looks to be entirely unnecessary.)
 Well, you are correct: if tmp != pdev then it is correct to acquire
 the lock. But if tmp == pdev and rom_only == true
 then we'll deadlock.

 It seems we need to have the locking conditional, e.g. only lock
 if tmp != pdev
>>> Which will address the live-lock, but introduce ABBA deadlock potential
>>> between the two locks.
>> I am not sure I can suggest a better solution here
>> @Roger, @Jan, could you please help here?
> Well, first of all I'd like to mention that while it may have been okay to
> not hold pcidevs_lock here for Dom0, it surely needs acquiring when 
> dealing
> with DomU-s' lists of PCI devices. The requirement really applies to the
> other use of for_each_pdev() as well (in vpci_dump_msi()), except that
> there it probably wants to be a try-lock.
>
> Next I'd like to point out that here we have the still pending issue of
> how to deal with hidden devices, which Dom0 can access. See my RFC patch
> "vPCI: account for hidden devices in modify_bars()". Whatever the solution
> here, I think it wants to at least account for the extra need there.
 Yes, sorry, I should take care of that.

> Now it is quite clear that pcidevs_lock isn't going to help with avoiding
> the deadlock, as it's imo not an option at all to acquire that lock
> everywhere else you access ->vpci (or else the vpci lock itself would be
> pointless). But a per-domain auxiliary r/w lock may help: Other paths
> would acquire it in read mode, and here you'd acquire it in write mode (in
> the former case around the vpci lock, while in the latter case there may
> then not be any need to acquire the individual vpci locks at all). FTAOD:
> I haven't fully thought through all implications (and hence whether this 
> is
> viable in the first place); I expect you will, documenting what you've
> found in the resulting patch description. Of course the double lock
> acquire/release would then likely want hiding in helper functions.
 I've 

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-04 Thread Jan Beulich
On 04.02.2022 13:37, Oleksandr Andrushchenko wrote:
> 
> 
> On 04.02.22 13:37, Jan Beulich wrote:
>> On 04.02.2022 12:13, Roger Pau Monné wrote:
>>> On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote:
 On 04.02.2022 11:12, Oleksandr Andrushchenko wrote:
> On 04.02.22 11:15, Jan Beulich wrote:
>> On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
>>> On 04.02.22 09:52, Jan Beulich wrote:
 On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev 
> *pdev, uint16_t cmd, bool rom_only)
> continue;
> }
> 
> +spin_lock(>vpci_lock);
> +if ( !tmp->vpci )
> +{
> +spin_unlock(>vpci_lock);
> +continue;
> +}
> for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ )
> {
> const struct vpci_bar *bar = 
> >vpci->header.bars[i];
> @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev 
> *pdev, uint16_t cmd, bool rom_only)
> rc = rangeset_remove_range(mem, start, end);
> if ( rc )
> {
> +spin_unlock(>vpci_lock);
> printk(XENLOG_G_WARNING "Failed to remove [%lx, 
> %lx]: %d\n",
>start, end, rc);
> rangeset_destroy(mem);
> return rc;
> }
> }
> +spin_unlock(>vpci_lock);
> }
 At the first glance this simply looks like another unjustified (in the
 description) change, as you're not converting anything here but you
 actually add locking (and I realize this was there before, so I'm sorry
 for not pointing this out earlier).
>>> Well, I thought that the description already has "...the lock can be
>>> used (and in a few cases is used right away) to check whether vpci
>>> is present" and this is enough for such uses as here.
 But then I wonder whether you
 actually tested this, since I can't help getting the impression that
 you're introducing a live-lock: The function is called from cmd_write()
 and rom_write(), which in turn are called out of vpci_write(). Yet that
 function already holds the lock, and the lock is not (currently)
 recursive. (For the 3rd caller of the function - init_bars() - otoh
 the locking looks to be entirely unnecessary.)
>>> Well, you are correct: if tmp != pdev then it is correct to acquire
>>> the lock. But if tmp == pdev and rom_only == true
>>> then we'll deadlock.
>>>
>>> It seems we need to have the locking conditional, e.g. only lock
>>> if tmp != pdev
>> Which will address the live-lock, but introduce ABBA deadlock potential
>> between the two locks.
> I am not sure I can suggest a better solution here
> @Roger, @Jan, could you please help here?
 Well, first of all I'd like to mention that while it may have been okay to
 not hold pcidevs_lock here for Dom0, it surely needs acquiring when dealing
 with DomU-s' lists of PCI devices. The requirement really applies to the
 other use of for_each_pdev() as well (in vpci_dump_msi()), except that
 there it probably wants to be a try-lock.

 Next I'd like to point out that here we have the still pending issue of
 how to deal with hidden devices, which Dom0 can access. See my RFC patch
 "vPCI: account for hidden devices in modify_bars()". Whatever the solution
 here, I think it wants to at least account for the extra need there.
>>> Yes, sorry, I should take care of that.
>>>
 Now it is quite clear that pcidevs_lock isn't going to help with avoiding
 the deadlock, as it's imo not an option at all to acquire that lock
 everywhere else you access ->vpci (or else the vpci lock itself would be
 pointless). But a per-domain auxiliary r/w lock may help: Other paths
 would acquire it in read mode, and here you'd acquire it in write mode (in
 the former case around the vpci lock, while in the latter case there may
 then not be any need to acquire the individual vpci locks at all). FTAOD:
 I haven't fully thought through all implications (and hence whether this is
 viable in the first place); I expect you will, documenting what you've
 found in the resulting patch description. Of course the double lock
 acquire/release would then likely want hiding in helper functions.
>>> I've been also thinking about this, and whether it's really worth to
>>> have a per-device lock rather than a per-domain one that protects all
>>> vpci regions of the devices assigned to the domain.
>>>
>>> 

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-04 Thread Oleksandr Andrushchenko


On 04.02.22 13:37, Jan Beulich wrote:
> On 04.02.2022 12:13, Roger Pau Monné wrote:
>> On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote:
>>> On 04.02.2022 11:12, Oleksandr Andrushchenko wrote:
 On 04.02.22 11:15, Jan Beulich wrote:
> On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
>> On 04.02.22 09:52, Jan Beulich wrote:
>>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
 @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev 
 *pdev, uint16_t cmd, bool rom_only)
 continue;
 }
 
 +spin_lock(>vpci_lock);
 +if ( !tmp->vpci )
 +{
 +spin_unlock(>vpci_lock);
 +continue;
 +}
 for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ )
 {
 const struct vpci_bar *bar = 
 >vpci->header.bars[i];
 @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev 
 *pdev, uint16_t cmd, bool rom_only)
 rc = rangeset_remove_range(mem, start, end);
 if ( rc )
 {
 +spin_unlock(>vpci_lock);
 printk(XENLOG_G_WARNING "Failed to remove [%lx, 
 %lx]: %d\n",
start, end, rc);
 rangeset_destroy(mem);
 return rc;
 }
 }
 +spin_unlock(>vpci_lock);
 }
>>> At the first glance this simply looks like another unjustified (in the
>>> description) change, as you're not converting anything here but you
>>> actually add locking (and I realize this was there before, so I'm sorry
>>> for not pointing this out earlier).
>> Well, I thought that the description already has "...the lock can be
>> used (and in a few cases is used right away) to check whether vpci
>> is present" and this is enough for such uses as here.
>>> But then I wonder whether you
>>> actually tested this, since I can't help getting the impression that
>>> you're introducing a live-lock: The function is called from cmd_write()
>>> and rom_write(), which in turn are called out of vpci_write(). Yet that
>>> function already holds the lock, and the lock is not (currently)
>>> recursive. (For the 3rd caller of the function - init_bars() - otoh
>>> the locking looks to be entirely unnecessary.)
>> Well, you are correct: if tmp != pdev then it is correct to acquire
>> the lock. But if tmp == pdev and rom_only == true
>> then we'll deadlock.
>>
>> It seems we need to have the locking conditional, e.g. only lock
>> if tmp != pdev
> Which will address the live-lock, but introduce ABBA deadlock potential
> between the two locks.
 I am not sure I can suggest a better solution here
 @Roger, @Jan, could you please help here?
>>> Well, first of all I'd like to mention that while it may have been okay to
>>> not hold pcidevs_lock here for Dom0, it surely needs acquiring when dealing
>>> with DomU-s' lists of PCI devices. The requirement really applies to the
>>> other use of for_each_pdev() as well (in vpci_dump_msi()), except that
>>> there it probably wants to be a try-lock.
>>>
>>> Next I'd like to point out that here we have the still pending issue of
>>> how to deal with hidden devices, which Dom0 can access. See my RFC patch
>>> "vPCI: account for hidden devices in modify_bars()". Whatever the solution
>>> here, I think it wants to at least account for the extra need there.
>> Yes, sorry, I should take care of that.
>>
>>> Now it is quite clear that pcidevs_lock isn't going to help with avoiding
>>> the deadlock, as it's imo not an option at all to acquire that lock
>>> everywhere else you access ->vpci (or else the vpci lock itself would be
>>> pointless). But a per-domain auxiliary r/w lock may help: Other paths
>>> would acquire it in read mode, and here you'd acquire it in write mode (in
>>> the former case around the vpci lock, while in the latter case there may
>>> then not be any need to acquire the individual vpci locks at all). FTAOD:
>>> I haven't fully thought through all implications (and hence whether this is
>>> viable in the first place); I expect you will, documenting what you've
>>> found in the resulting patch description. Of course the double lock
>>> acquire/release would then likely want hiding in helper functions.
>> I've been also thinking about this, and whether it's really worth to
>> have a per-device lock rather than a per-domain one that protects all
>> vpci regions of the devices assigned to the domain.
>>
>> The OS is likely to serialize accesses to the PCI config space anyway,
>> and the only place I could see a benefit of having per-device locks is
>> 

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-04 Thread Roger Pau Monné
On Fri, Feb 04, 2022 at 11:37:50AM +, Oleksandr Andrushchenko wrote:
> 
> 
> On 04.02.22 13:13, Roger Pau Monné wrote:
> > On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote:
> >> On 04.02.2022 11:12, Oleksandr Andrushchenko wrote:
> >>> On 04.02.22 11:15, Jan Beulich wrote:
>  On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
> > On 04.02.22 09:52, Jan Beulich wrote:
> >> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
> >>> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev 
> >>> *pdev, uint16_t cmd, bool rom_only)
> >>> continue;
> >>> }
> >>> 
> >>> +spin_lock(>vpci_lock);
> >>> +if ( !tmp->vpci )
> >>> +{
> >>> +spin_unlock(>vpci_lock);
> >>> +continue;
> >>> +}
> >>> for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ )
> >>> {
> >>> const struct vpci_bar *bar = 
> >>> >vpci->header.bars[i];
> >>> @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev 
> >>> *pdev, uint16_t cmd, bool rom_only)
> >>> rc = rangeset_remove_range(mem, start, end);
> >>> if ( rc )
> >>> {
> >>> +spin_unlock(>vpci_lock);
> >>> printk(XENLOG_G_WARNING "Failed to remove [%lx, 
> >>> %lx]: %d\n",
> >>>start, end, rc);
> >>> rangeset_destroy(mem);
> >>> return rc;
> >>> }
> >>> }
> >>> +spin_unlock(>vpci_lock);
> >>> }
> >> At the first glance this simply looks like another unjustified (in the
> >> description) change, as you're not converting anything here but you
> >> actually add locking (and I realize this was there before, so I'm sorry
> >> for not pointing this out earlier).
> > Well, I thought that the description already has "...the lock can be
> > used (and in a few cases is used right away) to check whether vpci
> > is present" and this is enough for such uses as here.
> >> But then I wonder whether you
> >> actually tested this, since I can't help getting the impression that
> >> you're introducing a live-lock: The function is called from cmd_write()
> >> and rom_write(), which in turn are called out of vpci_write(). Yet that
> >> function already holds the lock, and the lock is not (currently)
> >> recursive. (For the 3rd caller of the function - init_bars() - otoh
> >> the locking looks to be entirely unnecessary.)
> > Well, you are correct: if tmp != pdev then it is correct to acquire
> > the lock. But if tmp == pdev and rom_only == true
> > then we'll deadlock.
> >
> > It seems we need to have the locking conditional, e.g. only lock
> > if tmp != pdev
>  Which will address the live-lock, but introduce ABBA deadlock potential
>  between the two locks.
> >>> I am not sure I can suggest a better solution here
> >>> @Roger, @Jan, could you please help here?
> >> Well, first of all I'd like to mention that while it may have been okay to
> >> not hold pcidevs_lock here for Dom0, it surely needs acquiring when dealing
> >> with DomU-s' lists of PCI devices. The requirement really applies to the
> >> other use of for_each_pdev() as well (in vpci_dump_msi()), except that
> >> there it probably wants to be a try-lock.
> >>
> >> Next I'd like to point out that here we have the still pending issue of
> >> how to deal with hidden devices, which Dom0 can access. See my RFC patch
> >> "vPCI: account for hidden devices in modify_bars()". Whatever the solution
> >> here, I think it wants to at least account for the extra need there.
> > Yes, sorry, I should take care of that.
> >
> >> Now it is quite clear that pcidevs_lock isn't going to help with avoiding
> >> the deadlock, as it's imo not an option at all to acquire that lock
> >> everywhere else you access ->vpci (or else the vpci lock itself would be
> >> pointless). But a per-domain auxiliary r/w lock may help: Other paths
> >> would acquire it in read mode, and here you'd acquire it in write mode (in
> >> the former case around the vpci lock, while in the latter case there may
> >> then not be any need to acquire the individual vpci locks at all). FTAOD:
> >> I haven't fully thought through all implications (and hence whether this is
> >> viable in the first place); I expect you will, documenting what you've
> >> found in the resulting patch description. Of course the double lock
> >> acquire/release would then likely want hiding in helper functions.
> > I've been also thinking about this, and whether it's really worth to
> > have a per-device lock rather than a per-domain one that protects all
> > vpci regions of the devices assigned to the domain.
> >
> > The OS is likely to 

Re: [PATCH] tools/guest: Fix comment regarding CPUID compatibility

2022-02-04 Thread Andrew Cooper
On 04/02/2022 08:31, Jan Beulich wrote:
> On 03.02.2022 19:10, Andrew Cooper wrote:
>> It was Xen 4.14 where CPUID data was added to the migration stream, and 4.13
>> that we need to worry about with regards to compatibility.  Xen 4.12 isn't
>> relevant.
>>
>> Expand and correct the commentary.
>>
>> Fixes: 111c8c33a8a1 ("x86/cpuid: do not expand max leaves on restore")
> But doesn't this commit amend 685e922d6f30 ("tools/libxc: Rework
> xc_cpuid_apply_policy() to use {get,set}_cpu_policy()"), which is
> where DEF_MAX_* disappeared?

No. All that happened in that change was that we switched to using

cpuid.h:89:#define CPUID_GUEST_NR_EXTD_AMD

instead, which remained the same size until Xen 4.15 when e9b4fe26364
bumped it.

> While looking at this, wasn't Roger's change incomplete, in that
> for Intel the extended leaf upper bound was 0x8008 in 4.12?

CPUID_GUEST_NR_EXTD_INTEL is still 8, so this is all fine.

Even if hardware has more, we'll still limit to 8 on Intel.  That said,
to this date Intel haven't bumped the limit, so we're approaching 2
decades without change.

Should this change, then yes, we'd have have to adjust the compat path.

~Andrew


Re: [PATCH] xen: Modify domain_crash() to take a print string

2022-02-04 Thread Andrew Cooper
On 03/02/2022 15:06, Jan Beulich wrote:
> On 03.02.2022 15:41, Andrew Cooper wrote:
>> On 03/02/2022 14:19, Jan Beulich wrote:
>>> On 03.02.2022 15:11, Andrew Cooper wrote:
 On 03/02/2022 13:48, Julien Grall wrote:
> On 03/02/2022 13:38, Andrew Cooper wrote:
>> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
>> index 37f78cc4c4c9..38b390d20371 100644
>> --- a/xen/include/xen/sched.h
>> +++ b/xen/include/xen/sched.h
>> @@ -736,10 +736,15 @@ void vcpu_end_shutdown_deferral(struct vcpu *v);
>>    * from any processor.
>>    */
>>   void __domain_crash(struct domain *d);
>> -#define domain_crash(d) do
>> {  \
>> -    printk("domain_crash called from %s:%d\n", __FILE__,
>> __LINE__);   \
>> -   
>> __domain_crash(d);    \
>> -} while (0)
>> +#define domain_crash(d, ...)    \
>> +    do {    \
>> +    if ( count_args(__VA_ARGS__) == 0 ) \
>> +    printk("domain_crash called from %s:%d\n",  \
>> +   __FILE__, __LINE__); \
> I find a bit odd that here you are using a normal printk
 That's unmodified from before.  Only reformatted.

> but...
>
>
>> +    else    \
>> +    printk(XENLOG_G_ERR __VA_ARGS__);   \
> here it is XENLOG_G_ERR. In fact, isn't it ratelimited? If so,
> wouldn't it be better to only use XENLOG_ERR so they can always be
> seen? (A domain shouldn't be able to abuse it).
 Perhaps.  I suppose it is more important information than pretty much
 anything else about the guest.
>>> Indeed, but then - is this really an error in all cases?
>> Yes.  It is always a fatal event for the VM.
> Which may or may not be Xen's fault. If the guest put itself in a bad
> state, I don't see why we shouldn't consider such just a warning.

Log level is the severity of the action, not who's potentially to blame
for causing the situation.

Furthermore, almost all callers who do emit appropriate diagnostics
before domain_crash() already use ERR.

And, as already pointed out, it doesn't matter.  The line is going out
on the console however you want to try and bikeshed the internals.

>  IOW
> I continue to think a log level, if so wanted, should be supplied by
> the user of the macro.

That undermines the whole purpose of preventing callers from being able
to omit diagnostics.

~Andrew



Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-04 Thread Oleksandr Andrushchenko


On 04.02.22 13:13, Roger Pau Monné wrote:
> On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote:
>> On 04.02.2022 11:12, Oleksandr Andrushchenko wrote:
>>> On 04.02.22 11:15, Jan Beulich wrote:
 On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
> On 04.02.22 09:52, Jan Beulich wrote:
>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
>>> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev *pdev, 
>>> uint16_t cmd, bool rom_only)
>>> continue;
>>> }
>>> 
>>> +spin_lock(>vpci_lock);
>>> +if ( !tmp->vpci )
>>> +{
>>> +spin_unlock(>vpci_lock);
>>> +continue;
>>> +}
>>> for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ )
>>> {
>>> const struct vpci_bar *bar = >vpci->header.bars[i];
>>> @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev 
>>> *pdev, uint16_t cmd, bool rom_only)
>>> rc = rangeset_remove_range(mem, start, end);
>>> if ( rc )
>>> {
>>> +spin_unlock(>vpci_lock);
>>> printk(XENLOG_G_WARNING "Failed to remove [%lx, 
>>> %lx]: %d\n",
>>>start, end, rc);
>>> rangeset_destroy(mem);
>>> return rc;
>>> }
>>> }
>>> +spin_unlock(>vpci_lock);
>>> }
>> At the first glance this simply looks like another unjustified (in the
>> description) change, as you're not converting anything here but you
>> actually add locking (and I realize this was there before, so I'm sorry
>> for not pointing this out earlier).
> Well, I thought that the description already has "...the lock can be
> used (and in a few cases is used right away) to check whether vpci
> is present" and this is enough for such uses as here.
>> But then I wonder whether you
>> actually tested this, since I can't help getting the impression that
>> you're introducing a live-lock: The function is called from cmd_write()
>> and rom_write(), which in turn are called out of vpci_write(). Yet that
>> function already holds the lock, and the lock is not (currently)
>> recursive. (For the 3rd caller of the function - init_bars() - otoh
>> the locking looks to be entirely unnecessary.)
> Well, you are correct: if tmp != pdev then it is correct to acquire
> the lock. But if tmp == pdev and rom_only == true
> then we'll deadlock.
>
> It seems we need to have the locking conditional, e.g. only lock
> if tmp != pdev
 Which will address the live-lock, but introduce ABBA deadlock potential
 between the two locks.
>>> I am not sure I can suggest a better solution here
>>> @Roger, @Jan, could you please help here?
>> Well, first of all I'd like to mention that while it may have been okay to
>> not hold pcidevs_lock here for Dom0, it surely needs acquiring when dealing
>> with DomU-s' lists of PCI devices. The requirement really applies to the
>> other use of for_each_pdev() as well (in vpci_dump_msi()), except that
>> there it probably wants to be a try-lock.
>>
>> Next I'd like to point out that here we have the still pending issue of
>> how to deal with hidden devices, which Dom0 can access. See my RFC patch
>> "vPCI: account for hidden devices in modify_bars()". Whatever the solution
>> here, I think it wants to at least account for the extra need there.
> Yes, sorry, I should take care of that.
>
>> Now it is quite clear that pcidevs_lock isn't going to help with avoiding
>> the deadlock, as it's imo not an option at all to acquire that lock
>> everywhere else you access ->vpci (or else the vpci lock itself would be
>> pointless). But a per-domain auxiliary r/w lock may help: Other paths
>> would acquire it in read mode, and here you'd acquire it in write mode (in
>> the former case around the vpci lock, while in the latter case there may
>> then not be any need to acquire the individual vpci locks at all). FTAOD:
>> I haven't fully thought through all implications (and hence whether this is
>> viable in the first place); I expect you will, documenting what you've
>> found in the resulting patch description. Of course the double lock
>> acquire/release would then likely want hiding in helper functions.
> I've been also thinking about this, and whether it's really worth to
> have a per-device lock rather than a per-domain one that protects all
> vpci regions of the devices assigned to the domain.
>
> The OS is likely to serialize accesses to the PCI config space anyway,
> and the only place I could see a benefit of having per-device locks is
> in the handling of MSI-X tables, as the handling of the mask bit is
> likely very performance sensitive, so adding a per-domain lock there
> 

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-04 Thread Jan Beulich
On 04.02.2022 12:13, Roger Pau Monné wrote:
> On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote:
>> On 04.02.2022 11:12, Oleksandr Andrushchenko wrote:
>>> On 04.02.22 11:15, Jan Beulich wrote:
 On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
> On 04.02.22 09:52, Jan Beulich wrote:
>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
>>> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev *pdev, 
>>> uint16_t cmd, bool rom_only)
>>>continue;
>>>}
>>>
>>> +spin_lock(>vpci_lock);
>>> +if ( !tmp->vpci )
>>> +{
>>> +spin_unlock(>vpci_lock);
>>> +continue;
>>> +}
>>>for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ )
>>>{
>>>const struct vpci_bar *bar = >vpci->header.bars[i];
>>> @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev 
>>> *pdev, uint16_t cmd, bool rom_only)
>>>rc = rangeset_remove_range(mem, start, end);
>>>if ( rc )
>>>{
>>> +spin_unlock(>vpci_lock);
>>>printk(XENLOG_G_WARNING "Failed to remove [%lx, 
>>> %lx]: %d\n",
>>>   start, end, rc);
>>>rangeset_destroy(mem);
>>>return rc;
>>>}
>>>}
>>> +spin_unlock(>vpci_lock);
>>>}
>> At the first glance this simply looks like another unjustified (in the
>> description) change, as you're not converting anything here but you
>> actually add locking (and I realize this was there before, so I'm sorry
>> for not pointing this out earlier).
> Well, I thought that the description already has "...the lock can be
> used (and in a few cases is used right away) to check whether vpci
> is present" and this is enough for such uses as here.
>>But then I wonder whether you
>> actually tested this, since I can't help getting the impression that
>> you're introducing a live-lock: The function is called from cmd_write()
>> and rom_write(), which in turn are called out of vpci_write(). Yet that
>> function already holds the lock, and the lock is not (currently)
>> recursive. (For the 3rd caller of the function - init_bars() - otoh
>> the locking looks to be entirely unnecessary.)
> Well, you are correct: if tmp != pdev then it is correct to acquire
> the lock. But if tmp == pdev and rom_only == true
> then we'll deadlock.
>
> It seems we need to have the locking conditional, e.g. only lock
> if tmp != pdev
 Which will address the live-lock, but introduce ABBA deadlock potential
 between the two locks.
>>> I am not sure I can suggest a better solution here
>>> @Roger, @Jan, could you please help here?
>>
>> Well, first of all I'd like to mention that while it may have been okay to
>> not hold pcidevs_lock here for Dom0, it surely needs acquiring when dealing
>> with DomU-s' lists of PCI devices. The requirement really applies to the
>> other use of for_each_pdev() as well (in vpci_dump_msi()), except that
>> there it probably wants to be a try-lock.
>>
>> Next I'd like to point out that here we have the still pending issue of
>> how to deal with hidden devices, which Dom0 can access. See my RFC patch
>> "vPCI: account for hidden devices in modify_bars()". Whatever the solution
>> here, I think it wants to at least account for the extra need there.
> 
> Yes, sorry, I should take care of that.
> 
>> Now it is quite clear that pcidevs_lock isn't going to help with avoiding
>> the deadlock, as it's imo not an option at all to acquire that lock
>> everywhere else you access ->vpci (or else the vpci lock itself would be
>> pointless). But a per-domain auxiliary r/w lock may help: Other paths
>> would acquire it in read mode, and here you'd acquire it in write mode (in
>> the former case around the vpci lock, while in the latter case there may
>> then not be any need to acquire the individual vpci locks at all). FTAOD:
>> I haven't fully thought through all implications (and hence whether this is
>> viable in the first place); I expect you will, documenting what you've
>> found in the resulting patch description. Of course the double lock
>> acquire/release would then likely want hiding in helper functions.
> 
> I've been also thinking about this, and whether it's really worth to
> have a per-device lock rather than a per-domain one that protects all
> vpci regions of the devices assigned to the domain.
> 
> The OS is likely to serialize accesses to the PCI config space anyway,
> and the only place I could see a benefit of having per-device locks is
> in the handling of MSI-X tables, as the handling of the mask bit is
> likely very performance sensitive, so adding a per-domain lock there
> could 

Re: [PATCH v6 01/13] xen/pci: arm: add stub for is_memory_hole

2022-02-04 Thread Oleksandr Andrushchenko


On 04.02.22 13:00, Julien Grall wrote:
>
>
> On 04/02/2022 10:35, Oleksandr Andrushchenko wrote:
>>
>>
>> On 04.02.22 11:57, Julien Grall wrote:
>>> Hi,
>>>
>>> On 04/02/2022 09:47, Oleksandr Andrushchenko wrote:
>> Could you please help me with the exact message you would like to see?
>
> Here a summary of the discussion (+ some my follow-up thoughts):
>
> is_memory_hole() was recently introduced on x86 (see commit 75cc460a1b8c 
> "xen/pci: detect when BARs are not suitably positioned") to check whether 
> the BAR are positioned outside of a valid memory range. This was 
> introduced to work-around quirky firmware.
>
> In theory, this could also happen on Arm. In practice, this may not 
> happen but it sounds better to sanity check that the BAR contains "valid" 
> I/O range.
>
> On x86, this is implemented by checking the region is not described is in 
> the e820. IIUC, on Arm, the BARs have to be positioned in pre-defined 
> ranges. So I think it would be possible to implement is_memory_hole() by 
> going through the list of hostbridges and check the ranges.
>
> But first, I'd like to confirm my understanding with Rahul, and others.
>
> If we were going to go this route, I would also rename the function to be 
> better match what it is doing (i.e. it checks the BAR is correctly 
> placed). As a potentially optimization/hardening for Arm, we could pass 
> the hostbridge so we don't have to walk all of them.
 It seems this needs to live in the commit message then? So, it is easy to 
 find
 as everything after "---" is going to be dropped on commit
>>> I expect the function to be fully implemented before this is will be merged.
>>>
>>> So if it is fully implemented, then a fair chunk of what I wrote would not 
>>> be necessary to carry in the commit message.
>> Well, we started from that we want *something* with TODO and now
>> you request it to be fully implemented before it is merged.
>
> I don't think I ever suggested this patch would be merged as-is. Sorry if 
> this may have crossed like this.
Np
>
> Instead, my intent by asking you to send a TODO patch is to start a 
> discussion how this function could be implemented for Arm.
>
> You sent a TODO but you didn't provide any summary on what is the issue, what 
> we want to achieve... Hence my request to add a bit more details so the other 
> reviewers can provide their opinion more easily.
Ok, so we can discuss it here, but I won't have this patch in v7
>
> Cheers,
>
Thank you,
Oleksandr

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-04 Thread Roger Pau Monné
On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote:
> On 04.02.2022 11:12, Oleksandr Andrushchenko wrote:
> > On 04.02.22 11:15, Jan Beulich wrote:
> >> On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
> >>> On 04.02.22 09:52, Jan Beulich wrote:
>  On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
> > @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev *pdev, 
> > uint16_t cmd, bool rom_only)
> >continue;
> >}
> >
> > +spin_lock(>vpci_lock);
> > +if ( !tmp->vpci )
> > +{
> > +spin_unlock(>vpci_lock);
> > +continue;
> > +}
> >for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ )
> >{
> >const struct vpci_bar *bar = >vpci->header.bars[i];
> > @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev 
> > *pdev, uint16_t cmd, bool rom_only)
> >rc = rangeset_remove_range(mem, start, end);
> >if ( rc )
> >{
> > +spin_unlock(>vpci_lock);
> >printk(XENLOG_G_WARNING "Failed to remove [%lx, 
> > %lx]: %d\n",
> >   start, end, rc);
> >rangeset_destroy(mem);
> >return rc;
> >}
> >}
> > +spin_unlock(>vpci_lock);
> >}
>  At the first glance this simply looks like another unjustified (in the
>  description) change, as you're not converting anything here but you
>  actually add locking (and I realize this was there before, so I'm sorry
>  for not pointing this out earlier).
> >>> Well, I thought that the description already has "...the lock can be
> >>> used (and in a few cases is used right away) to check whether vpci
> >>> is present" and this is enough for such uses as here.
> But then I wonder whether you
>  actually tested this, since I can't help getting the impression that
>  you're introducing a live-lock: The function is called from cmd_write()
>  and rom_write(), which in turn are called out of vpci_write(). Yet that
>  function already holds the lock, and the lock is not (currently)
>  recursive. (For the 3rd caller of the function - init_bars() - otoh
>  the locking looks to be entirely unnecessary.)
> >>> Well, you are correct: if tmp != pdev then it is correct to acquire
> >>> the lock. But if tmp == pdev and rom_only == true
> >>> then we'll deadlock.
> >>>
> >>> It seems we need to have the locking conditional, e.g. only lock
> >>> if tmp != pdev
> >> Which will address the live-lock, but introduce ABBA deadlock potential
> >> between the two locks.
> > I am not sure I can suggest a better solution here
> > @Roger, @Jan, could you please help here?
> 
> Well, first of all I'd like to mention that while it may have been okay to
> not hold pcidevs_lock here for Dom0, it surely needs acquiring when dealing
> with DomU-s' lists of PCI devices. The requirement really applies to the
> other use of for_each_pdev() as well (in vpci_dump_msi()), except that
> there it probably wants to be a try-lock.
> 
> Next I'd like to point out that here we have the still pending issue of
> how to deal with hidden devices, which Dom0 can access. See my RFC patch
> "vPCI: account for hidden devices in modify_bars()". Whatever the solution
> here, I think it wants to at least account for the extra need there.

Yes, sorry, I should take care of that.

> Now it is quite clear that pcidevs_lock isn't going to help with avoiding
> the deadlock, as it's imo not an option at all to acquire that lock
> everywhere else you access ->vpci (or else the vpci lock itself would be
> pointless). But a per-domain auxiliary r/w lock may help: Other paths
> would acquire it in read mode, and here you'd acquire it in write mode (in
> the former case around the vpci lock, while in the latter case there may
> then not be any need to acquire the individual vpci locks at all). FTAOD:
> I haven't fully thought through all implications (and hence whether this is
> viable in the first place); I expect you will, documenting what you've
> found in the resulting patch description. Of course the double lock
> acquire/release would then likely want hiding in helper functions.

I've been also thinking about this, and whether it's really worth to
have a per-device lock rather than a per-domain one that protects all
vpci regions of the devices assigned to the domain.

The OS is likely to serialize accesses to the PCI config space anyway,
and the only place I could see a benefit of having per-device locks is
in the handling of MSI-X tables, as the handling of the mask bit is
likely very performance sensitive, so adding a per-domain lock there
could be a bottleneck.

We could alternatively do a per-domain rwlock for vpci and special case
the 

[xen-unstable test] 168001: tolerable FAIL - PUSHED

2022-02-04 Thread osstest service owner
flight 168001 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168001/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 167994
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 167994
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 167994
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 167994
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 167994
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 167994
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 167994
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 167994
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 167994
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 167994
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 167994
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 167994
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass

version targeted for testing:
 xen  75cc460a1b8cfd8e5d2c4302234ee194defe4872
baseline version:
 xen  

Re: [PATCH v6 01/13] xen/pci: arm: add stub for is_memory_hole

2022-02-04 Thread Julien Grall




On 04/02/2022 10:35, Oleksandr Andrushchenko wrote:



On 04.02.22 11:57, Julien Grall wrote:

Hi,

On 04/02/2022 09:47, Oleksandr Andrushchenko wrote:

Could you please help me with the exact message you would like to see?


Here a summary of the discussion (+ some my follow-up thoughts):

is_memory_hole() was recently introduced on x86 (see commit 75cc460a1b8c "xen/pci: 
detect when BARs are not suitably positioned") to check whether the BAR are 
positioned outside of a valid memory range. This was introduced to work-around quirky 
firmware.

In theory, this could also happen on Arm. In practice, this may not happen but it sounds 
better to sanity check that the BAR contains "valid" I/O range.

On x86, this is implemented by checking the region is not described is in the 
e820. IIUC, on Arm, the BARs have to be positioned in pre-defined ranges. So I 
think it would be possible to implement is_memory_hole() by going through the 
list of hostbridges and check the ranges.

But first, I'd like to confirm my understanding with Rahul, and others.

If we were going to go this route, I would also rename the function to be 
better match what it is doing (i.e. it checks the BAR is correctly placed). As 
a potentially optimization/hardening for Arm, we could pass the hostbridge so 
we don't have to walk all of them.

It seems this needs to live in the commit message then? So, it is easy to find
as everything after "---" is going to be dropped on commit

I expect the function to be fully implemented before this is will be merged.

So if it is fully implemented, then a fair chunk of what I wrote would not be 
necessary to carry in the commit message.

Well, we started from that we want *something* with TODO and now
you request it to be fully implemented before it is merged.


I don't think I ever suggested this patch would be merged as-is. Sorry 
if this may have crossed like this.


Instead, my intent by asking you to send a TODO patch is to start a 
discussion how this function could be implemented for Arm.


You sent a TODO but you didn't provide any summary on what is the issue, 
what we want to achieve... Hence my request to add a bit more details so 
the other reviewers can provide their opinion more easily.


Cheers,

--
Julien Grall



Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-04 Thread Roger Pau Monné
On Fri, Feb 04, 2022 at 10:12:46AM +, Oleksandr Andrushchenko wrote:
> Hi, Jan!
> 
> On 04.02.22 11:15, Jan Beulich wrote:
> > On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
> >> On 04.02.22 09:52, Jan Beulich wrote:
> >>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
>  @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev *pdev, 
>  uint16_t cmd, bool rom_only)
> continue;
> }
> 
>  +spin_lock(>vpci_lock);
>  +if ( !tmp->vpci )
>  +{
>  +spin_unlock(>vpci_lock);
>  +continue;
>  +}
> for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ )
> {
> const struct vpci_bar *bar = >vpci->header.bars[i];
>  @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev *pdev, 
>  uint16_t cmd, bool rom_only)
> rc = rangeset_remove_range(mem, start, end);
> if ( rc )
> {
>  +spin_unlock(>vpci_lock);
> printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: 
>  %d\n",
>    start, end, rc);
> rangeset_destroy(mem);
> return rc;
> }
> }
>  +spin_unlock(>vpci_lock);
> }
> >>> At the first glance this simply looks like another unjustified (in the
> >>> description) change, as you're not converting anything here but you
> >>> actually add locking (and I realize this was there before, so I'm sorry
> >>> for not pointing this out earlier).
> >> Well, I thought that the description already has "...the lock can be
> >> used (and in a few cases is used right away) to check whether vpci
> >> is present" and this is enough for such uses as here.
> >>>But then I wonder whether you
> >>> actually tested this, since I can't help getting the impression that
> >>> you're introducing a live-lock: The function is called from cmd_write()
> >>> and rom_write(), which in turn are called out of vpci_write(). Yet that
> >>> function already holds the lock, and the lock is not (currently)
> >>> recursive. (For the 3rd caller of the function - init_bars() - otoh
> >>> the locking looks to be entirely unnecessary.)
> >> Well, you are correct: if tmp != pdev then it is correct to acquire
> >> the lock. But if tmp == pdev and rom_only == true
> >> then we'll deadlock.
> >>
> >> It seems we need to have the locking conditional, e.g. only lock
> >> if tmp != pdev
> > Which will address the live-lock, but introduce ABBA deadlock potential
> > between the two locks.
> I am not sure I can suggest a better solution here
> @Roger, @Jan, could you please help here?

I think we could set the locking order based on the memory address of
the locks, ie:

if ( >vpci_lock < >vpci_lock )
{
spin_unlock(>vpci_lock);
spin_lock(>vpci_lock);
spin_lock(>vpci_lock);
if ( !pdev->vpci || >vpci->header != header )
/* ERROR: vpci removed or recreated. */
}
else
spin_lock(>vpci_lock);

That however creates a window where the address of the BARs on the
current device (pdev) could be changed, so the result of the mapping
might be skewed. I think the guest would only have itself to blame for
that, as changing the position of the BARs while toggling memory
decoding is not something sensible to do.

> >
>  @@ -222,10 +239,10 @@ static int msix_read(struct vcpu *v, unsigned long 
>  addr, unsigned int len,
> break;
> }
> 
>  +msix_put(msix);
> return X86EMUL_OKAY;
> }
> 
>  -spin_lock(>pdev->vpci->lock);
> entry = get_entry(msix, addr);
> offset = addr & (PCI_MSIX_ENTRY_SIZE - 1);
> >>> You're increasing the locked region quite a bit here. If this is really
> >>> needed, it wants explaining. And if this is deemed acceptable as a
> >>> "side effect", it wants justifying or at least stating imo. Same for
> >>> msix_write() then, obviously.
> >> Yes, I do increase the locking region here, but the msix variable needs
> >> to be protected all the time, so it seems to be obvious that it remains
> >> under the lock
> > What does the msix variable have to do with the vPCI lock? If you see
> > a need to grow the locked region here, then surely this is independent
> > of your conversion of the lock, and hence wants to be a prereq fix
> > (which may in fact want/need backporting).
> First of all, the implementation of msix_get is wrong and needs to be:
> 
> /*
>   * Note: if vpci_msix found, then this function returns with
>   * pdev->vpci_lock held. Use msix_put to unlock.
>   */
> static struct vpci_msix *msix_get(const struct domain *d, unsigned long addr)
> {
>      struct vpci_msix *msix;
> 
>      list_for_each_entry ( msix, >arch.hvm.msix_tables, next )


Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-04 Thread Jan Beulich
On 04.02.2022 11:12, Oleksandr Andrushchenko wrote:
> On 04.02.22 11:15, Jan Beulich wrote:
>> On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
>>> On 04.02.22 09:52, Jan Beulich wrote:
 On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev *pdev, 
> uint16_t cmd, bool rom_only)
>continue;
>}
>
> +spin_lock(>vpci_lock);
> +if ( !tmp->vpci )
> +{
> +spin_unlock(>vpci_lock);
> +continue;
> +}
>for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ )
>{
>const struct vpci_bar *bar = >vpci->header.bars[i];
> @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev *pdev, 
> uint16_t cmd, bool rom_only)
>rc = rangeset_remove_range(mem, start, end);
>if ( rc )
>{
> +spin_unlock(>vpci_lock);
>printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: 
> %d\n",
>   start, end, rc);
>rangeset_destroy(mem);
>return rc;
>}
>}
> +spin_unlock(>vpci_lock);
>}
 At the first glance this simply looks like another unjustified (in the
 description) change, as you're not converting anything here but you
 actually add locking (and I realize this was there before, so I'm sorry
 for not pointing this out earlier).
>>> Well, I thought that the description already has "...the lock can be
>>> used (and in a few cases is used right away) to check whether vpci
>>> is present" and this is enough for such uses as here.
But then I wonder whether you
 actually tested this, since I can't help getting the impression that
 you're introducing a live-lock: The function is called from cmd_write()
 and rom_write(), which in turn are called out of vpci_write(). Yet that
 function already holds the lock, and the lock is not (currently)
 recursive. (For the 3rd caller of the function - init_bars() - otoh
 the locking looks to be entirely unnecessary.)
>>> Well, you are correct: if tmp != pdev then it is correct to acquire
>>> the lock. But if tmp == pdev and rom_only == true
>>> then we'll deadlock.
>>>
>>> It seems we need to have the locking conditional, e.g. only lock
>>> if tmp != pdev
>> Which will address the live-lock, but introduce ABBA deadlock potential
>> between the two locks.
> I am not sure I can suggest a better solution here
> @Roger, @Jan, could you please help here?

Well, first of all I'd like to mention that while it may have been okay to
not hold pcidevs_lock here for Dom0, it surely needs acquiring when dealing
with DomU-s' lists of PCI devices. The requirement really applies to the
other use of for_each_pdev() as well (in vpci_dump_msi()), except that
there it probably wants to be a try-lock.

Next I'd like to point out that here we have the still pending issue of
how to deal with hidden devices, which Dom0 can access. See my RFC patch
"vPCI: account for hidden devices in modify_bars()". Whatever the solution
here, I think it wants to at least account for the extra need there.

Now it is quite clear that pcidevs_lock isn't going to help with avoiding
the deadlock, as it's imo not an option at all to acquire that lock
everywhere else you access ->vpci (or else the vpci lock itself would be
pointless). But a per-domain auxiliary r/w lock may help: Other paths
would acquire it in read mode, and here you'd acquire it in write mode (in
the former case around the vpci lock, while in the latter case there may
then not be any need to acquire the individual vpci locks at all). FTAOD:
I haven't fully thought through all implications (and hence whether this is
viable in the first place); I expect you will, documenting what you've
found in the resulting patch description. Of course the double lock
acquire/release would then likely want hiding in helper functions.

Jan




Re: [PATCH v6 01/13] xen/pci: arm: add stub for is_memory_hole

2022-02-04 Thread Oleksandr Andrushchenko


On 04.02.22 11:57, Julien Grall wrote:
> Hi,
>
> On 04/02/2022 09:47, Oleksandr Andrushchenko wrote:
 Could you please help me with the exact message you would like to see?
>>>
>>> Here a summary of the discussion (+ some my follow-up thoughts):
>>>
>>> is_memory_hole() was recently introduced on x86 (see commit 75cc460a1b8c 
>>> "xen/pci: detect when BARs are not suitably positioned") to check whether 
>>> the BAR are positioned outside of a valid memory range. This was introduced 
>>> to work-around quirky firmware.
>>>
>>> In theory, this could also happen on Arm. In practice, this may not happen 
>>> but it sounds better to sanity check that the BAR contains "valid" I/O 
>>> range.
>>>
>>> On x86, this is implemented by checking the region is not described is in 
>>> the e820. IIUC, on Arm, the BARs have to be positioned in pre-defined 
>>> ranges. So I think it would be possible to implement is_memory_hole() by 
>>> going through the list of hostbridges and check the ranges.
>>>
>>> But first, I'd like to confirm my understanding with Rahul, and others.
>>>
>>> If we were going to go this route, I would also rename the function to be 
>>> better match what it is doing (i.e. it checks the BAR is correctly placed). 
>>> As a potentially optimization/hardening for Arm, we could pass the 
>>> hostbridge so we don't have to walk all of them.
>> It seems this needs to live in the commit message then? So, it is easy to 
>> find
>> as everything after "---" is going to be dropped on commit
> I expect the function to be fully implemented before this is will be merged.
>
> So if it is fully implemented, then a fair chunk of what I wrote would not be 
> necessary to carry in the commit message.
Well, we started from that we want *something* with TODO and now
you request it to be fully implemented before it is merged.
What do I miss here?
>
> Cheers,
>
Thank you,
Oleksandr

Re: [PATCH 2/3] x86/vmsi: add support for extended destination ID in address field

2022-02-04 Thread Jan Beulich
On 04.02.2022 10:54, Roger Pau Monné wrote:
> On Fri, Feb 04, 2022 at 10:30:54AM +0100, Jan Beulich wrote:
>> On 04.02.2022 10:23, Roger Pau Monné wrote:
>>> On Mon, Jan 24, 2022 at 02:47:58PM +0100, Jan Beulich wrote:
 On 20.01.2022 16:23, Roger Pau Monne wrote:
> --- a/xen/arch/x86/include/asm/msi.h
> +++ b/xen/arch/x86/include/asm/msi.h
> @@ -54,6 +54,7 @@
>  #define MSI_ADDR_DEST_ID_SHIFT   12
>  #define   MSI_ADDR_DEST_ID_MASK  0x00ff000
>  #define  MSI_ADDR_DEST_ID(dest)  (((dest) << 
> MSI_ADDR_DEST_ID_SHIFT) & MSI_ADDR_DEST_ID_MASK)
> +#define   MSI_ADDR_EXT_DEST_ID_MASK  0xfe0

 Especially the immediately preceding macro now becomes kind of stale.
>>>
>>> Hm, I'm not so sure about that. We could expand the macro to place the
>>> high bits in dest at bits 11:4 of the resulting address. However that
>>> macro (MSI_ADDR_DEST_ID) is only used by Xen to compose its own
>>> messages, so until we add support for the hypervisor itself to use the
>>> extended destination ID mode there's not much point in modifying the
>>> macro IMO.
>>
>> Well, this is all unhelpful considering the different form of extended
>> ID in Intel's doc. At least by way of a comment things need clarifying
>> and potential pitfalls need pointing out imo.
> 
> Sure, will add some comments there.
> 
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -588,6 +588,7 @@ struct xen_domctl_bind_pt_irq {
>  #define XEN_DOMCTL_VMSI_X86_DELIV_MASK   0x007000
>  #define XEN_DOMCTL_VMSI_X86_TRIG_MASK0x008000
>  #define XEN_DOMCTL_VMSI_X86_UNMASKED 0x01
> +#define XEN_DOMCTL_VMSI_X86_EXT_DEST_ID_MASK 0xfe

 I'm not convinced it is a good idea to limit the overall destination
 ID width to 15 bits here - at the interface level we could as well
 permit more bits right away; the implementation would reject too high
 a value, of course. Not only with this I further wonder whether the
 field shouldn't be unsplit while extending it. You won't get away
 without bumping XEN_DOMCTL_INTERFACE_VERSION anyway (unless it was
 bumped already for 4.17) since afaics the unused bits of this field
 previously weren't checked for being zero. We could easily have 8
 bits vector, 16 bits flags, and 32 bits destination ID in the struct.
 And there would then still be 8 unused bits (which from now on we
 ought to check for being zero).
>>>
>>> So I've made gflags a 64bit field, used the high 32bits for the
>>> destination ID, and the low ones for flags. I've left gvec as a
>>> separate field in the struct, as I don't want to require a
>>> modification to QEMU, just a rebuild against the updated headers will
>>> be enough.
>>
>> Hmm, wait - if qemu uses this without going through a suitable
>> abstraction in at least libxc, then we cannot _ever_ change this
>> interface: If a rebuild was required, old qemu binaries would
>> stop working with newer Xen. If such a dependency indeed exists,
>> we'll need a prominent warning comment in the public header.
> 
> Hm, it's bad. The xc_domain_update_msi_irq interface uses a gflags
> parameter that's the gflags parameter of xen_domctl_bind_pt_irq. Which
> is even worse because it's not using the mask definitions from
> domctl.h, but rather a copy of them named XEN_PT_GFLAGS_* that are
> hardcoded in xen_pt_msi.c in QEMU code.
> 
> So we can likely expand the layout of gflags, but moving fields is not
> an option. I think my original proposal of adding a
> XEN_DOMCTL_VMSI_X86_EXT_DEST_ID_MASK mask is the less bad option until
> we add a new stable interface for device passthrough for QEMU.

Given the observations - yeah, not much of a choice left.

Jan




Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-04 Thread Oleksandr Andrushchenko
Hi, Jan!

On 04.02.22 11:15, Jan Beulich wrote:
> On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
>> On 04.02.22 09:52, Jan Beulich wrote:
>>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
 @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev *pdev, 
 uint16_t cmd, bool rom_only)
continue;
}

 +spin_lock(>vpci_lock);
 +if ( !tmp->vpci )
 +{
 +spin_unlock(>vpci_lock);
 +continue;
 +}
for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ )
{
const struct vpci_bar *bar = >vpci->header.bars[i];
 @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev *pdev, 
 uint16_t cmd, bool rom_only)
rc = rangeset_remove_range(mem, start, end);
if ( rc )
{
 +spin_unlock(>vpci_lock);
printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: 
 %d\n",
   start, end, rc);
rangeset_destroy(mem);
return rc;
}
}
 +spin_unlock(>vpci_lock);
}
>>> At the first glance this simply looks like another unjustified (in the
>>> description) change, as you're not converting anything here but you
>>> actually add locking (and I realize this was there before, so I'm sorry
>>> for not pointing this out earlier).
>> Well, I thought that the description already has "...the lock can be
>> used (and in a few cases is used right away) to check whether vpci
>> is present" and this is enough for such uses as here.
>>>But then I wonder whether you
>>> actually tested this, since I can't help getting the impression that
>>> you're introducing a live-lock: The function is called from cmd_write()
>>> and rom_write(), which in turn are called out of vpci_write(). Yet that
>>> function already holds the lock, and the lock is not (currently)
>>> recursive. (For the 3rd caller of the function - init_bars() - otoh
>>> the locking looks to be entirely unnecessary.)
>> Well, you are correct: if tmp != pdev then it is correct to acquire
>> the lock. But if tmp == pdev and rom_only == true
>> then we'll deadlock.
>>
>> It seems we need to have the locking conditional, e.g. only lock
>> if tmp != pdev
> Which will address the live-lock, but introduce ABBA deadlock potential
> between the two locks.
I am not sure I can suggest a better solution here
@Roger, @Jan, could you please help here?
>
 @@ -222,10 +239,10 @@ static int msix_read(struct vcpu *v, unsigned long 
 addr, unsigned int len,
break;
}

 +msix_put(msix);
return X86EMUL_OKAY;
}

 -spin_lock(>pdev->vpci->lock);
entry = get_entry(msix, addr);
offset = addr & (PCI_MSIX_ENTRY_SIZE - 1);
>>> You're increasing the locked region quite a bit here. If this is really
>>> needed, it wants explaining. And if this is deemed acceptable as a
>>> "side effect", it wants justifying or at least stating imo. Same for
>>> msix_write() then, obviously.
>> Yes, I do increase the locking region here, but the msix variable needs
>> to be protected all the time, so it seems to be obvious that it remains
>> under the lock
> What does the msix variable have to do with the vPCI lock? If you see
> a need to grow the locked region here, then surely this is independent
> of your conversion of the lock, and hence wants to be a prereq fix
> (which may in fact want/need backporting).
First of all, the implementation of msix_get is wrong and needs to be:

/*
  * Note: if vpci_msix found, then this function returns with
  * pdev->vpci_lock held. Use msix_put to unlock.
  */
static struct vpci_msix *msix_get(const struct domain *d, unsigned long addr)
{
     struct vpci_msix *msix;

     list_for_each_entry ( msix, >arch.hvm.msix_tables, next )
     {
     const struct vpci_bar *bars;
     unsigned int i;

     spin_lock(>pdev->vpci_lock);
     if ( !msix->pdev->vpci )
     {
     spin_unlock(>pdev->vpci_lock);
     continue;
     }

     bars = msix->pdev->vpci->header.bars;
     for ( i = 0; i < ARRAY_SIZE(msix->tables); i++ )
     if ( bars[msix->tables[i] & PCI_MSIX_BIRMASK].enabled &&
  VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, i) )
     return msix;

     spin_unlock(>pdev->vpci_lock);
     }

     return NULL;
}

Then, both msix_{read|write} can dereference msix->pdev->vpci early,
this is why Roger suggested we move to msix_{get|put} here.
And yes, we grow the locked region here and yes this might want a
prereq fix. Or just be fixed while at it.

>
 @@ -327,7 +334,12 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, 
 unsigned int 

Re: [PATCH v6 01/13] xen/pci: arm: add stub for is_memory_hole

2022-02-04 Thread Julien Grall

Hi,

On 04/02/2022 09:47, Oleksandr Andrushchenko wrote:

Could you please help me with the exact message you would like to see?


Here a summary of the discussion (+ some my follow-up thoughts):

is_memory_hole() was recently introduced on x86 (see commit 75cc460a1b8c "xen/pci: 
detect when BARs are not suitably positioned") to check whether the BAR are 
positioned outside of a valid memory range. This was introduced to work-around quirky 
firmware.

In theory, this could also happen on Arm. In practice, this may not happen but it sounds 
better to sanity check that the BAR contains "valid" I/O range.

On x86, this is implemented by checking the region is not described is in the 
e820. IIUC, on Arm, the BARs have to be positioned in pre-defined ranges. So I 
think it would be possible to implement is_memory_hole() by going through the 
list of hostbridges and check the ranges.

But first, I'd like to confirm my understanding with Rahul, and others.

If we were going to go this route, I would also rename the function to be 
better match what it is doing (i.e. it checks the BAR is correctly placed). As 
a potentially optimization/hardening for Arm, we could pass the hostbridge so 
we don't have to walk all of them.

It seems this needs to live in the commit message then? So, it is easy to find
as everything after "---" is going to be dropped on commit

I expect the function to be fully implemented before this is will be merged.

So if it is fully implemented, then a fair chunk of what I wrote would 
not be necessary to carry in the commit message.


Cheers,

--
Julien Grall



Re: [PATCH 2/3] x86/vmsi: add support for extended destination ID in address field

2022-02-04 Thread Roger Pau Monné
On Fri, Feb 04, 2022 at 10:30:54AM +0100, Jan Beulich wrote:
> On 04.02.2022 10:23, Roger Pau Monné wrote:
> > On Mon, Jan 24, 2022 at 02:47:58PM +0100, Jan Beulich wrote:
> >> On 20.01.2022 16:23, Roger Pau Monne wrote:
> >>> --- a/xen/arch/x86/include/asm/msi.h
> >>> +++ b/xen/arch/x86/include/asm/msi.h
> >>> @@ -54,6 +54,7 @@
> >>>  #define MSI_ADDR_DEST_ID_SHIFT   12
> >>>  #define   MSI_ADDR_DEST_ID_MASK  0x00ff000
> >>>  #define  MSI_ADDR_DEST_ID(dest)  (((dest) << 
> >>> MSI_ADDR_DEST_ID_SHIFT) & MSI_ADDR_DEST_ID_MASK)
> >>> +#define   MSI_ADDR_EXT_DEST_ID_MASK  0xfe0
> >>
> >> Especially the immediately preceding macro now becomes kind of stale.
> > 
> > Hm, I'm not so sure about that. We could expand the macro to place the
> > high bits in dest at bits 11:4 of the resulting address. However that
> > macro (MSI_ADDR_DEST_ID) is only used by Xen to compose its own
> > messages, so until we add support for the hypervisor itself to use the
> > extended destination ID mode there's not much point in modifying the
> > macro IMO.
> 
> Well, this is all unhelpful considering the different form of extended
> ID in Intel's doc. At least by way of a comment things need clarifying
> and potential pitfalls need pointing out imo.

Sure, will add some comments there.

> >>> --- a/xen/include/public/domctl.h
> >>> +++ b/xen/include/public/domctl.h
> >>> @@ -588,6 +588,7 @@ struct xen_domctl_bind_pt_irq {
> >>>  #define XEN_DOMCTL_VMSI_X86_DELIV_MASK   0x007000
> >>>  #define XEN_DOMCTL_VMSI_X86_TRIG_MASK0x008000
> >>>  #define XEN_DOMCTL_VMSI_X86_UNMASKED 0x01
> >>> +#define XEN_DOMCTL_VMSI_X86_EXT_DEST_ID_MASK 0xfe
> >>
> >> I'm not convinced it is a good idea to limit the overall destination
> >> ID width to 15 bits here - at the interface level we could as well
> >> permit more bits right away; the implementation would reject too high
> >> a value, of course. Not only with this I further wonder whether the
> >> field shouldn't be unsplit while extending it. You won't get away
> >> without bumping XEN_DOMCTL_INTERFACE_VERSION anyway (unless it was
> >> bumped already for 4.17) since afaics the unused bits of this field
> >> previously weren't checked for being zero. We could easily have 8
> >> bits vector, 16 bits flags, and 32 bits destination ID in the struct.
> >> And there would then still be 8 unused bits (which from now on we
> >> ought to check for being zero).
> > 
> > So I've made gflags a 64bit field, used the high 32bits for the
> > destination ID, and the low ones for flags. I've left gvec as a
> > separate field in the struct, as I don't want to require a
> > modification to QEMU, just a rebuild against the updated headers will
> > be enough.
> 
> Hmm, wait - if qemu uses this without going through a suitable
> abstraction in at least libxc, then we cannot _ever_ change this
> interface: If a rebuild was required, old qemu binaries would
> stop working with newer Xen. If such a dependency indeed exists,
> we'll need a prominent warning comment in the public header.

Hm, it's bad. The xc_domain_update_msi_irq interface uses a gflags
parameter that's the gflags parameter of xen_domctl_bind_pt_irq. Which
is even worse because it's not using the mask definitions from
domctl.h, but rather a copy of them named XEN_PT_GFLAGS_* that are
hardcoded in xen_pt_msi.c in QEMU code.

So we can likely expand the layout of gflags, but moving fields is not
an option. I think my original proposal of adding a
XEN_DOMCTL_VMSI_X86_EXT_DEST_ID_MASK mask is the less bad option until
we add a new stable interface for device passthrough for QEMU.

Thanks, Roger.



Re: [PATCH v6 01/13] xen/pci: arm: add stub for is_memory_hole

2022-02-04 Thread Oleksandr Andrushchenko


On 04.02.22 11:41, Julien Grall wrote:
> On 04/02/2022 09:01, Oleksandr Andrushchenko wrote:
>> On 04.02.22 10:51, Julien Grall wrote:
>>> Hi,
>>>
>>> On 04/02/2022 06:34, Oleksandr Andrushchenko wrote:
 From: Oleksandr Andrushchenko 

 Add a stub for is_memory_hole which is required for PCI passthrough
 on Arm.

 Signed-off-by: Oleksandr Andrushchenko 

 ---
 Cc: Julien Grall 
 Cc: Stefano Stabellini 
 ---
 New in v6
 ---
    xen/arch/arm/mm.c | 6 ++
    1 file changed, 6 insertions(+)

 diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
 index b1eae767c27c..c32e34a182a2 100644
 --- a/xen/arch/arm/mm.c
 +++ b/xen/arch/arm/mm.c
 @@ -1640,6 +1640,12 @@ unsigned long get_upper_mfn_bound(void)
    return max_page - 1;
    }
    +bool is_memory_hole(mfn_t start, mfn_t end)
 +{
 +    /* TODO: this needs to be properly implemented. */
>>>
>>> I was hoping to see a summary of the discussion from IRC somewhere in the 
>>> patch (maybe after ---). This would help to bring up to speed the others 
>>> that were not on IRC.
>> I am not quite sure what needs to be put here as the summary
>
> At least some details on why this is a TODO. Is it because you are unsure of 
> the implementation? Is it because you wanted to send early?...
>
> IOW, what are you expecting from the reviewers?
Well, I just need to allow PCI passthrough to be built on Arm at the moment.
Clearly, without this stub I can't do so. This is the only intention now.
Of course, while PCI passthrough on Arm is still not really enabled those
who want trying it will need reverting the offending patch otherwise.
I am fine both ways
>
>> Could you please help me with the exact message you would like to see?
>
> Here a summary of the discussion (+ some my follow-up thoughts):
>
> is_memory_hole() was recently introduced on x86 (see commit 75cc460a1b8c 
> "xen/pci: detect when BARs are not suitably positioned") to check whether the 
> BAR are positioned outside of a valid memory range. This was introduced to 
> work-around quirky firmware.
>
> In theory, this could also happen on Arm. In practice, this may not happen 
> but it sounds better to sanity check that the BAR contains "valid" I/O range.
>
> On x86, this is implemented by checking the region is not described is in the 
> e820. IIUC, on Arm, the BARs have to be positioned in pre-defined ranges. So 
> I think it would be possible to implement is_memory_hole() by going through 
> the list of hostbridges and check the ranges.
>
> But first, I'd like to confirm my understanding with Rahul, and others.
>
> If we were going to go this route, I would also rename the function to be 
> better match what it is doing (i.e. it checks the BAR is correctly placed). 
> As a potentially optimization/hardening for Arm, we could pass the hostbridge 
> so we don't have to walk all of them.
It seems this needs to live in the commit message then? So, it is easy to find
as everything after "---" is going to be dropped on commit
>
> Cheers,
>
Thank you,
Oleksandr

Re: [PATCH v6 01/13] xen/pci: arm: add stub for is_memory_hole

2022-02-04 Thread Julien Grall

On 04/02/2022 09:01, Oleksandr Andrushchenko wrote:

On 04.02.22 10:51, Julien Grall wrote:

Hi,

On 04/02/2022 06:34, Oleksandr Andrushchenko wrote:

From: Oleksandr Andrushchenko 

Add a stub for is_memory_hole which is required for PCI passthrough
on Arm.

Signed-off-by: Oleksandr Andrushchenko 

---
Cc: Julien Grall 
Cc: Stefano Stabellini 
---
New in v6
---
   xen/arch/arm/mm.c | 6 ++
   1 file changed, 6 insertions(+)

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index b1eae767c27c..c32e34a182a2 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -1640,6 +1640,12 @@ unsigned long get_upper_mfn_bound(void)
   return max_page - 1;
   }
   +bool is_memory_hole(mfn_t start, mfn_t end)
+{
+    /* TODO: this needs to be properly implemented. */


I was hoping to see a summary of the discussion from IRC somewhere in the patch 
(maybe after ---). This would help to bring up to speed the others that were 
not on IRC.

I am not quite sure what needs to be put here as the summary


At least some details on why this is a TODO. Is it because you are 
unsure of the implementation? Is it because you wanted to send early?...


IOW, what are you expecting from the reviewers?


Could you please help me with the exact message you would like to see?


Here a summary of the discussion (+ some my follow-up thoughts):

is_memory_hole() was recently introduced on x86 (see commit 75cc460a1b8c 
"xen/pci: detect when BARs are not suitably positioned") to check 
whether the BAR are positioned outside of a valid memory range. This was 
introduced to work-around quirky firmware.


In theory, this could also happen on Arm. In practice, this may not 
happen but it sounds better to sanity check that the BAR contains 
"valid" I/O range.


On x86, this is implemented by checking the region is not described is 
in the e820. IIUC, on Arm, the BARs have to be positioned in pre-defined 
ranges. So I think it would be possible to implement is_memory_hole() by 
going through the list of hostbridges and check the ranges.


But first, I'd like to confirm my understanding with Rahul, and others.

If we were going to go this route, I would also rename the function to 
be better match what it is doing (i.e. it checks the BAR is correctly 
placed). As a potentially optimization/hardening for Arm, we could pass 
the hostbridge so we don't have to walk all of them.


Cheers,

--
Julien Grall



Re: [PATCH v3 2/2] x86/mm: tidy XENMEM_{get,set}_pod_target handling

2022-02-04 Thread Jan Beulich
On 04.02.2022 10:28, Roger Pau Monné wrote:
> On Wed, Feb 02, 2022 at 04:29:37PM +0100, Jan Beulich wrote:
>> On 02.02.2022 16:14, Roger Pau Monné wrote:
>>> On Tue, Jan 04, 2022 at 10:41:53AM +0100, Jan Beulich wrote:
 Do away with the "pod_target_out_unlock" label. In particular by folding
 if()-s, the logic can be expressed with less code (and no goto-s) this
 way.

 Limit scope of "p2m", constifying it at the same time.
>>>
>>> Is this stale? I cannot find any reference to a p2m variable in the
>>> chunks below.
>>
>> Indeed it is, leftover from rebasing over the introduction of
>> p2m_pod_get_mem_target() in what is now patch 1. Dropped.
> 
> I'm happy with this change with the commit adjusted:
> 
> Acked-by: Roger Pau Monné 

Thanks.

> Not sure if you can commit this now regardless of patch 1?

I think it can be moved ahead; there looks to be only a minor contextual
dependency.

Jan




Re: [PATCH 2/3] x86/vmsi: add support for extended destination ID in address field

2022-02-04 Thread Jan Beulich
On 04.02.2022 10:23, Roger Pau Monné wrote:
> On Mon, Jan 24, 2022 at 02:47:58PM +0100, Jan Beulich wrote:
>> On 20.01.2022 16:23, Roger Pau Monne wrote:
>>> --- a/xen/arch/x86/include/asm/msi.h
>>> +++ b/xen/arch/x86/include/asm/msi.h
>>> @@ -54,6 +54,7 @@
>>>  #define MSI_ADDR_DEST_ID_SHIFT 12
>>>  #define MSI_ADDR_DEST_ID_MASK  0x00ff000
>>>  #define  MSI_ADDR_DEST_ID(dest)(((dest) << 
>>> MSI_ADDR_DEST_ID_SHIFT) & MSI_ADDR_DEST_ID_MASK)
>>> +#define MSI_ADDR_EXT_DEST_ID_MASK  0xfe0
>>
>> Especially the immediately preceding macro now becomes kind of stale.
> 
> Hm, I'm not so sure about that. We could expand the macro to place the
> high bits in dest at bits 11:4 of the resulting address. However that
> macro (MSI_ADDR_DEST_ID) is only used by Xen to compose its own
> messages, so until we add support for the hypervisor itself to use the
> extended destination ID mode there's not much point in modifying the
> macro IMO.

Well, this is all unhelpful considering the different form of extended
ID in Intel's doc. At least by way of a comment things need clarifying
and potential pitfalls need pointing out imo.

>>> --- a/xen/include/public/domctl.h
>>> +++ b/xen/include/public/domctl.h
>>> @@ -588,6 +588,7 @@ struct xen_domctl_bind_pt_irq {
>>>  #define XEN_DOMCTL_VMSI_X86_DELIV_MASK   0x007000
>>>  #define XEN_DOMCTL_VMSI_X86_TRIG_MASK0x008000
>>>  #define XEN_DOMCTL_VMSI_X86_UNMASKED 0x01
>>> +#define XEN_DOMCTL_VMSI_X86_EXT_DEST_ID_MASK 0xfe
>>
>> I'm not convinced it is a good idea to limit the overall destination
>> ID width to 15 bits here - at the interface level we could as well
>> permit more bits right away; the implementation would reject too high
>> a value, of course. Not only with this I further wonder whether the
>> field shouldn't be unsplit while extending it. You won't get away
>> without bumping XEN_DOMCTL_INTERFACE_VERSION anyway (unless it was
>> bumped already for 4.17) since afaics the unused bits of this field
>> previously weren't checked for being zero. We could easily have 8
>> bits vector, 16 bits flags, and 32 bits destination ID in the struct.
>> And there would then still be 8 unused bits (which from now on we
>> ought to check for being zero).
> 
> So I've made gflags a 64bit field, used the high 32bits for the
> destination ID, and the low ones for flags. I've left gvec as a
> separate field in the struct, as I don't want to require a
> modification to QEMU, just a rebuild against the updated headers will
> be enough.

Hmm, wait - if qemu uses this without going through a suitable
abstraction in at least libxc, then we cannot _ever_ change this
interface: If a rebuild was required, old qemu binaries would
stop working with newer Xen. If such a dependency indeed exists,
we'll need a prominent warning comment in the public header.

Jan

> I've been wondering about this interface though
> (xen_domctl_bind_pt_irq), and it would seem better to just pass the
> raw MSI address and data fields from the guest and let Xen do the
> decoding. This however is not trivial to do as we would need to keep
> the previous interface anyway as it's used by QEMU. Maybe we could
> have some kind of union between a pair of address and data fields and
> a gflags one that would match the native layout, but as said not
> trivial (and would require using anonymous unions which I'm not sure
> are accepted even for domctls in the public headers).
> 
> Thanks, Roger.
> 




Re: [PATCH v3 2/2] x86/mm: tidy XENMEM_{get,set}_pod_target handling

2022-02-04 Thread Roger Pau Monné
On Wed, Feb 02, 2022 at 04:29:37PM +0100, Jan Beulich wrote:
> On 02.02.2022 16:14, Roger Pau Monné wrote:
> > On Tue, Jan 04, 2022 at 10:41:53AM +0100, Jan Beulich wrote:
> >> Do away with the "pod_target_out_unlock" label. In particular by folding
> >> if()-s, the logic can be expressed with less code (and no goto-s) this
> >> way.
> >>
> >> Limit scope of "p2m", constifying it at the same time.
> > 
> > Is this stale? I cannot find any reference to a p2m variable in the
> > chunks below.
> 
> Indeed it is, leftover from rebasing over the introduction of
> p2m_pod_get_mem_target() in what is now patch 1. Dropped.

I'm happy with this change with the commit adjusted:

Acked-by: Roger Pau Monné 

Not sure if you can commit this now regardless of patch 1?

Thanks, Roger.



Re: [PATCH] x86/Xen: streamline (and fix) PV CPU enumeration

2022-02-04 Thread Juergen Gross

On 01.02.22 11:57, Jan Beulich wrote:

This started out with me noticing that "dom0_max_vcpus=" with 
larger than the number of physical CPUs reported through ACPI tables
would not bring up the "excess" vCPU-s. Addressing this is the primary
purpose of the change; CPU maps handling is being tidied only as far as
is necessary for the change here (with the effect of also avoiding the
setting up of too much per-CPU infrastructure, i.e. for CPUs which can
never come online).

Noticing that xen_fill_possible_map() is called way too early, whereas
xen_filter_cpu_maps() is called too late (after per-CPU areas were
already set up), and further observing that each of the functions serves
only one of Dom0 or DomU, it looked like it was better to simplify this.
Use the .get_smp_config hook instead, uniformly for Dom0 and DomU.
xen_fill_possible_map() can be dropped altogether, while
xen_filter_cpu_maps() is re-purposed but not otherwise changed.

Signed-off-by: Jan Beulich 


Pushed to xen/tip.git for-linus-5.17a


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: x86: insn-eval.c's use of native_store_gdt()

2022-02-04 Thread Jan Beulich
On 30.11.2021 12:09, Jan Beulich wrote:
> I think it is b968e84b509d ("x86/iopl: Fake iopl(3) CLI/STI usage")
> which uncovered an issue with get_desc() trying to access the GDT, as
> introduced by 670f928ba09b ("x86/insn-eval: Add utility function to
> get segment descriptor"). When running in a PV domain under Xen, the
> (hypervisor's) GDT isn't accessible; with UMIP enabled by Xen even
> SGDT wouldn't work, as the kernel runs in ring 3.
> 
> There's currently no hypercall to retrieve a descriptor from the GDT.
> It is instead assumed that kernels know where their present GDT
> lives. Can the native_store_gdt() be replaced there, please?
> 
> For context (I don't think it should matter much here) I'm observing
> this with the kernel put underneath a rather old distro, where
> hwclock triggers this path.

I'd like to note that the issue still exists in 5.16.

Jan




Re: [PATCH v2] Improve docs for IOCTL_GNTDEV_MAP_GRANT_REF

2022-02-04 Thread Juergen Gross

On 31.01.22 18:23, Demi Marie Obenour wrote:

The current implementation of gntdev guarantees that the first call to
IOCTL_GNTDEV_MAP_GRANT_REF will set @index to 0.  This is required to
use gntdev for Wayland, which is a future desire of Qubes OS.
Additionally, requesting zero grants results in an error, but this was
not documented either.  Document both of these.

Signed-off-by: Demi Marie Obenour 


Pushed to xen/tip.git for-linus-5.17a


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH v2] xen: update missing ioctl magic numers documentation

2022-02-04 Thread Juergen Gross

On 31.01.22 17:19, Randy Dunlap wrote:

Add missing ioctl "magic numbers" for various Xen interfaces
(xenbus_dev.h, gntalloc.h, gntdev.h, and privcmd.h).

Signed-off-by: Randy Dunlap 


Pushed to xen/tip.git for-linus-5.17a


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH] xen: xenbus_dev.h: delete incorrect file name

2022-02-04 Thread Juergen Gross

On 30.01.22 20:17, Randy Dunlap wrote:

It is better/preferred not to include file names in source files
because (a) they are not needed and (b) they can be incorrect,
so just delete this incorrect file name.

Signed-off-by: Randy Dunlap 


Pushed to xen/tip.git for-linus-5.17a


Juergen



OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH 2/3] x86/vmsi: add support for extended destination ID in address field

2022-02-04 Thread Roger Pau Monné
On Mon, Jan 24, 2022 at 02:47:58PM +0100, Jan Beulich wrote:
> On 20.01.2022 16:23, Roger Pau Monne wrote:
> > --- a/xen/arch/x86/include/asm/msi.h
> > +++ b/xen/arch/x86/include/asm/msi.h
> > @@ -54,6 +54,7 @@
> >  #define MSI_ADDR_DEST_ID_SHIFT 12
> >  #define MSI_ADDR_DEST_ID_MASK  0x00ff000
> >  #define  MSI_ADDR_DEST_ID(dest)(((dest) << 
> > MSI_ADDR_DEST_ID_SHIFT) & MSI_ADDR_DEST_ID_MASK)
> > +#define MSI_ADDR_EXT_DEST_ID_MASK  0xfe0
> 
> Especially the immediately preceding macro now becomes kind of stale.

Hm, I'm not so sure about that. We could expand the macro to place the
high bits in dest at bits 11:4 of the resulting address. However that
macro (MSI_ADDR_DEST_ID) is only used by Xen to compose its own
messages, so until we add support for the hypervisor itself to use the
extended destination ID mode there's not much point in modifying the
macro IMO.

> 
> > --- a/xen/drivers/passthrough/x86/hvm.c
> > +++ b/xen/drivers/passthrough/x86/hvm.c
> > @@ -269,7 +269,7 @@ int pt_irq_create_bind(
> >  {
> >  case PT_IRQ_TYPE_MSI:
> >  {
> > -uint8_t dest, delivery_mode;
> > +unsigned int dest, delivery_mode;
> >  bool dest_mode;
> 
> If you touch delivery_mode's type, wouldn't that better become bool?
> 
> > --- a/xen/include/public/domctl.h
> > +++ b/xen/include/public/domctl.h
> > @@ -588,6 +588,7 @@ struct xen_domctl_bind_pt_irq {
> >  #define XEN_DOMCTL_VMSI_X86_DELIV_MASK   0x007000
> >  #define XEN_DOMCTL_VMSI_X86_TRIG_MASK0x008000
> >  #define XEN_DOMCTL_VMSI_X86_UNMASKED 0x01
> > +#define XEN_DOMCTL_VMSI_X86_EXT_DEST_ID_MASK 0xfe
> 
> I'm not convinced it is a good idea to limit the overall destination
> ID width to 15 bits here - at the interface level we could as well
> permit more bits right away; the implementation would reject too high
> a value, of course. Not only with this I further wonder whether the
> field shouldn't be unsplit while extending it. You won't get away
> without bumping XEN_DOMCTL_INTERFACE_VERSION anyway (unless it was
> bumped already for 4.17) since afaics the unused bits of this field
> previously weren't checked for being zero. We could easily have 8
> bits vector, 16 bits flags, and 32 bits destination ID in the struct.
> And there would then still be 8 unused bits (which from now on we
> ought to check for being zero).

So I've made gflags a 64bit field, used the high 32bits for the
destination ID, and the low ones for flags. I've left gvec as a
separate field in the struct, as I don't want to require a
modification to QEMU, just a rebuild against the updated headers will
be enough.

I've been wondering about this interface though
(xen_domctl_bind_pt_irq), and it would seem better to just pass the
raw MSI address and data fields from the guest and let Xen do the
decoding. This however is not trivial to do as we would need to keep
the previous interface anyway as it's used by QEMU. Maybe we could
have some kind of union between a pair of address and data fields and
a gflags one that would match the native layout, but as said not
trivial (and would require using anonymous unions which I'm not sure
are accepted even for domctls in the public headers).

Thanks, Roger.



Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-04 Thread Jan Beulich
On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
> On 04.02.22 09:52, Jan Beulich wrote:
>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
>>> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev *pdev, 
>>> uint16_t cmd, bool rom_only)
>>>   continue;
>>>   }
>>>   
>>> +spin_lock(>vpci_lock);
>>> +if ( !tmp->vpci )
>>> +{
>>> +spin_unlock(>vpci_lock);
>>> +continue;
>>> +}
>>>   for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ )
>>>   {
>>>   const struct vpci_bar *bar = >vpci->header.bars[i];
>>> @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev *pdev, 
>>> uint16_t cmd, bool rom_only)
>>>   rc = rangeset_remove_range(mem, start, end);
>>>   if ( rc )
>>>   {
>>> +spin_unlock(>vpci_lock);
>>>   printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: 
>>> %d\n",
>>>  start, end, rc);
>>>   rangeset_destroy(mem);
>>>   return rc;
>>>   }
>>>   }
>>> +spin_unlock(>vpci_lock);
>>>   }
>> At the first glance this simply looks like another unjustified (in the
>> description) change, as you're not converting anything here but you
>> actually add locking (and I realize this was there before, so I'm sorry
>> for not pointing this out earlier).
> Well, I thought that the description already has "...the lock can be
> used (and in a few cases is used right away) to check whether vpci
> is present" and this is enough for such uses as here.
>>   But then I wonder whether you
>> actually tested this, since I can't help getting the impression that
>> you're introducing a live-lock: The function is called from cmd_write()
>> and rom_write(), which in turn are called out of vpci_write(). Yet that
>> function already holds the lock, and the lock is not (currently)
>> recursive. (For the 3rd caller of the function - init_bars() - otoh
>> the locking looks to be entirely unnecessary.)
> Well, you are correct: if tmp != pdev then it is correct to acquire
> the lock. But if tmp == pdev and rom_only == true
> then we'll deadlock.
> 
> It seems we need to have the locking conditional, e.g. only lock
> if tmp != pdev

Which will address the live-lock, but introduce ABBA deadlock potential
between the two locks.

>>> @@ -222,10 +239,10 @@ static int msix_read(struct vcpu *v, unsigned long 
>>> addr, unsigned int len,
>>>   break;
>>>   }
>>>   
>>> +msix_put(msix);
>>>   return X86EMUL_OKAY;
>>>   }
>>>   
>>> -spin_lock(>pdev->vpci->lock);
>>>   entry = get_entry(msix, addr);
>>>   offset = addr & (PCI_MSIX_ENTRY_SIZE - 1);
>> You're increasing the locked region quite a bit here. If this is really
>> needed, it wants explaining. And if this is deemed acceptable as a
>> "side effect", it wants justifying or at least stating imo. Same for
>> msix_write() then, obviously.
> Yes, I do increase the locking region here, but the msix variable needs
> to be protected all the time, so it seems to be obvious that it remains
> under the lock

What does the msix variable have to do with the vPCI lock? If you see
a need to grow the locked region here, then surely this is independent
of your conversion of the lock, and hence wants to be a prereq fix
(which may in fact want/need backporting).

>>> @@ -327,7 +334,12 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, 
>>> unsigned int size)
>>>   if ( !pdev )
>>>   return vpci_read_hw(sbdf, reg, size);
>>>   
>>> -spin_lock(>vpci->lock);
>>> +spin_lock(>vpci_lock);
>>> +if ( !pdev->vpci )
>>> +{
>>> +spin_unlock(>vpci_lock);
>>> +return vpci_read_hw(sbdf, reg, size);
>>> +}
>> Didn't you say you would add justification of this part of the change
>> (and its vpci_write() counterpart) to the description?
> Again, I am referring to the commit message as described above

No, sorry - that part applies only to what inside the parentheses of
if(). But on the intermediate version (post-v5 in a 4-patch series) I
did say:

"In this case as well as in its write counterpart it becomes even more
 important to justify (in the description) the new behavior. It is not
 obvious at all that the absence of a struct vpci should be taken as
 an indication that the underlying device needs accessing instead.
 This also cannot be inferred from the "!pdev" case visible in context.
 In that case we have no record of a device at this SBDF, and hence the
 fallback pretty clearly is a "just in case" one. Yet if we know of a
 device, the absence of a struct vpci may mean various possible things."

If it wasn't obvious: The comment was on the use of vpci_read_hw() on
this path, not redundant with the earlier one regarding the added
"is vpci non-NULL" in a few places.

Jan




[qemu-mainline test] 168000: tolerable FAIL - PUSHED

2022-02-04 Thread osstest service owner
flight 168000 qemu-mainline real [real]
flight 168007 qemu-mainline real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/168000/
http://logs.test-lab.xenproject.org/osstest/logs/168007/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm 12 debian-hvm-install fail pass 
in 168007-retest

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 167991
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 167991
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 167991
 test-armhf-armhf-xl-rtds 18 guest-start/debian.repeatfail  like 167991
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 167991
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 167991
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 167991
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 167991
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 167991
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass

version targeted for testing:
 qemuu8f3e5ce773c62bb5c4a847f3a9a5c98bbb3b359f
baseline version:
 qemuu   

Re: [PATCH v6 01/13] xen/pci: arm: add stub for is_memory_hole

2022-02-04 Thread Oleksandr Andrushchenko
Hi, Julien!

On 04.02.22 10:51, Julien Grall wrote:
> Hi,
>
> On 04/02/2022 06:34, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko 
>>
>> Add a stub for is_memory_hole which is required for PCI passthrough
>> on Arm.
>>
>> Signed-off-by: Oleksandr Andrushchenko 
>>
>> ---
>> Cc: Julien Grall 
>> Cc: Stefano Stabellini 
>> ---
>> New in v6
>> ---
>>   xen/arch/arm/mm.c | 6 ++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
>> index b1eae767c27c..c32e34a182a2 100644
>> --- a/xen/arch/arm/mm.c
>> +++ b/xen/arch/arm/mm.c
>> @@ -1640,6 +1640,12 @@ unsigned long get_upper_mfn_bound(void)
>>   return max_page - 1;
>>   }
>>   +bool is_memory_hole(mfn_t start, mfn_t end)
>> +{
>> +    /* TODO: this needs to be properly implemented. */
>
> I was hoping to see a summary of the discussion from IRC somewhere in the 
> patch (maybe after ---). This would help to bring up to speed the others that 
> were not on IRC.
I am not quite sure what needs to be put here as the summary
Could you please help me with the exact message you would like to see?
>
>> +    return true;
>> +}
>> +
>>   /*
>>    * Local variables:
>>    * mode: C
>
Thank you,
Oleksandr

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-04 Thread Oleksandr Andrushchenko
Hi, Jan!

On 04.02.22 09:52, Jan Beulich wrote:
> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
>> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev *pdev, 
>> uint16_t cmd, bool rom_only)
>>   continue;
>>   }
>>   
>> +spin_lock(>vpci_lock);
>> +if ( !tmp->vpci )
>> +{
>> +spin_unlock(>vpci_lock);
>> +continue;
>> +}
>>   for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ )
>>   {
>>   const struct vpci_bar *bar = >vpci->header.bars[i];
>> @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev *pdev, 
>> uint16_t cmd, bool rom_only)
>>   rc = rangeset_remove_range(mem, start, end);
>>   if ( rc )
>>   {
>> +spin_unlock(>vpci_lock);
>>   printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: 
>> %d\n",
>>  start, end, rc);
>>   rangeset_destroy(mem);
>>   return rc;
>>   }
>>   }
>> +spin_unlock(>vpci_lock);
>>   }
> At the first glance this simply looks like another unjustified (in the
> description) change, as you're not converting anything here but you
> actually add locking (and I realize this was there before, so I'm sorry
> for not pointing this out earlier).
Well, I thought that the description already has "...the lock can be
used (and in a few cases is used right away) to check whether vpci
is present" and this is enough for such uses as here.
>   But then I wonder whether you
> actually tested this, since I can't help getting the impression that
> you're introducing a live-lock: The function is called from cmd_write()
> and rom_write(), which in turn are called out of vpci_write(). Yet that
> function already holds the lock, and the lock is not (currently)
> recursive. (For the 3rd caller of the function - init_bars() - otoh
> the locking looks to be entirely unnecessary.)
Well, you are correct: if tmp != pdev then it is correct to acquire
the lock. But if tmp == pdev and rom_only == true
then we'll deadlock.

It seems we need to have the locking conditional, e.g. only lock
if tmp != pdev
>
> Then again this was present already even in Roger's original patch, so
> I guess I must be missing something ...
>
>> --- a/xen/drivers/vpci/msix.c
>> +++ b/xen/drivers/vpci/msix.c
>> @@ -138,7 +138,7 @@ static void control_write(const struct pci_dev *pdev, 
>> unsigned int reg,
>>   pci_conf_write16(pdev->sbdf, reg, val);
>>   }
>>   
>> -static struct vpci_msix *msix_find(const struct domain *d, unsigned long 
>> addr)
>> +static struct vpci_msix *msix_get(const struct domain *d, unsigned long 
>> addr)
>>   {
>>   struct vpci_msix *msix;
>>   
>> @@ -150,15 +150,29 @@ static struct vpci_msix *msix_find(const struct domain 
>> *d, unsigned long addr)
>>   for ( i = 0; i < ARRAY_SIZE(msix->tables); i++ )
>>   if ( bars[msix->tables[i] & PCI_MSIX_BIRMASK].enabled &&
>>VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, i) )
>> +{
>> +spin_lock(>pdev->vpci_lock);
>>   return msix;
>> +}
> I think deliberately returning with a lock held requires a respective
> comment ahead of the function.
Ok, will add a comment
>
>>   }
>>   
>>   return NULL;
>>   }
>>   
>> +static void msix_put(struct vpci_msix *msix)
>> +{
>> +if ( !msix )
>> +return;
>> +
>> +spin_unlock(>pdev->vpci_lock);
>> +}
> Maybe shorter
>
>  if ( msix )
>  spin_unlock(>pdev->vpci_lock);
Looks good
>
> ? Yet there's only one case where you may pass NULL in here, so
> maybe it's better anyway to move the conditional ...
>
>>   static int msix_accept(struct vcpu *v, unsigned long addr)
>>   {
>> -return !!msix_find(v->domain, addr);
>> +struct vpci_msix *msix = msix_get(v->domain, addr);
>> +
>> +msix_put(msix);
>> +return !!msix;
>>   }
> ... here?
Yes, I can have that check here, but what if there is yet
another caller of the same? I am not sure whether it is better
to have the check in msix_get or at the caller site.
At the moment (with a single place with NULL possible) I can
move the check. @Roger?
>
>> @@ -186,7 +200,7 @@ static int msix_read(struct vcpu *v, unsigned long addr, 
>> unsigned int len,
>>unsigned long *data)
>>   {
>>   const struct domain *d = v->domain;
>> -struct vpci_msix *msix = msix_find(d, addr);
>> +struct vpci_msix *msix = msix_get(d, addr);
>>   const struct vpci_msix_entry *entry;
>>   unsigned int offset;
>>   
>> @@ -196,7 +210,10 @@ static int msix_read(struct vcpu *v, unsigned long 
>> addr, unsigned int len,
>>   return X86EMUL_RETRY;
>>   
>>   if ( !access_allowed(msix->pdev, addr, len) )
>> +{
>> +msix_put(msix);
>>   return X86EMUL_OKAY;
>> +}
>>   
>>   if ( 

Re: [PATCH v6 01/13] xen/pci: arm: add stub for is_memory_hole

2022-02-04 Thread Julien Grall

Hi,

On 04/02/2022 06:34, Oleksandr Andrushchenko wrote:

From: Oleksandr Andrushchenko 

Add a stub for is_memory_hole which is required for PCI passthrough
on Arm.

Signed-off-by: Oleksandr Andrushchenko 

---
Cc: Julien Grall 
Cc: Stefano Stabellini 
---
New in v6
---
  xen/arch/arm/mm.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index b1eae767c27c..c32e34a182a2 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -1640,6 +1640,12 @@ unsigned long get_upper_mfn_bound(void)
  return max_page - 1;
  }
  
+bool is_memory_hole(mfn_t start, mfn_t end)

+{
+/* TODO: this needs to be properly implemented. */


I was hoping to see a summary of the discussion from IRC somewhere in 
the patch (maybe after ---). This would help to bring up to speed the 
others that were not on IRC.



+return true;
+}
+
  /*
   * Local variables:
   * mode: C


--
Julien Grall



Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-04 Thread Jan Beulich
On 04.02.2022 09:13, Oleksandr Andrushchenko wrote:
> On 04.02.22 09:52, Jan Beulich wrote:
>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
>>
>> At the first glance this simply looks like another unjustified (in the
>> description) change, as you're not converting anything here but you
>> actually add locking (and I realize this was there before, so I'm sorry
>> for not pointing this out earlier). But then I wonder whether you
>> actually tested this
> This is already stated in the cover letter that I have tested two x86
> configurations and tested that on Arm...

Okay, I'm sorry then. But could you then please point out where I'm
wrong with my analysis?

Jan




Re: [PATCH] tools/guest: Fix comment regarding CPUID compatibility

2022-02-04 Thread Jan Beulich
On 03.02.2022 19:10, Andrew Cooper wrote:
> It was Xen 4.14 where CPUID data was added to the migration stream, and 4.13
> that we need to worry about with regards to compatibility.  Xen 4.12 isn't
> relevant.
> 
> Expand and correct the commentary.
> 
> Fixes: 111c8c33a8a1 ("x86/cpuid: do not expand max leaves on restore")

But doesn't this commit amend 685e922d6f30 ("tools/libxc: Rework
xc_cpuid_apply_policy() to use {get,set}_cpu_policy()"), which is
where DEF_MAX_* disappeared? That was a 4.13 change, and iirc that
was also why the commit above moved the 4.13/4.14 boundary related
comment from outside of the if() to inside it.

While looking at this, wasn't Roger's change incomplete, in that
for Intel the extended leaf upper bound was 0x8008 in 4.12?

Jan




Re: [PATCH v6 12/13] xen/arm: translate virtual PCI bus topology for guests

2022-02-04 Thread Oleksandr Andrushchenko
Hi, Jan!

On 04.02.22 09:56, Jan Beulich wrote:
> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -168,6 +168,35 @@ static void vpci_remove_virtual_device(struct domain *d,
>>   pdev->vpci->guest_sbdf.sbdf = ~0;
>>   }
>>   
>> +/*
>> + * Find the physical device which is mapped to the virtual device
>> + * and translate virtual SBDF to the physical one.
>> + */
>> +bool vpci_translate_virtual_device(const struct domain *d, pci_sbdf_t *sbdf)
>> +{
>> +struct pci_dev *pdev;
>> +
>> +ASSERT(!is_hardware_domain(d));
> In addition to this, don't you also need to assert that pcidevs_lock is
> held (or if it isn't, you'd need to acquire it) for ...
>
>> +for_each_pdev( d, pdev )
> ... this to be race-free?
Yes, you are right and this needs pcidevs_lock();
Will add
>
> Jan
>
Thank you,
Oleksandr

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-04 Thread Oleksandr Andrushchenko
Hi, Jan!

On 04.02.22 09:52, Jan Beulich wrote:
> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
>
> At the first glance this simply looks like another unjustified (in the
> description) change, as you're not converting anything here but you
> actually add locking (and I realize this was there before, so I'm sorry
> for not pointing this out earlier). But then I wonder whether you
> actually tested this
This is already stated in the cover letter that I have tested two x86
configurations and tested that on Arm...
Would you like to see the relevant logs?

Thank you,
Oleksandr