On 06/03/2018 11:01, David Hildenbrand wrote:
On 27.02.2018 16:44, Tony Krowiak wrote:
This patch series is the QEMU counterpart to the KVM/kernel support for
guest dedicated crypto adapters. The KVM/kernel model is built on the
VFIO mediated device framework and provides the infrastructure for
granting exclusive guest access to crypto devices installed on the linux
host. This patch series introduces a new QEMU command line option, QEMU
object model and CPU model features to exploit the KVM/kernel model.
See the detailed specifications for AP virtualization provided by this
patch set in docs/vfio-ap.txt for a more complete discussion of the
design introduced by this patch series.
v1 -> v2 Change log:
* Removed unnecessary S390APMatrixDevice, S390APMatrixDeviceClass
* Removed ioctl to configure the AP matrix for the guest: letting the
vfio_ap device driver's 'open' callback configure the AP matrix
for the guest
* Removed masks from object model: Unnecessary at this point because they
are not currently used
* VFIOAPMatrixDevice to VFIOAPDevice
* VFIOAPMatrixDeviceClass to VFIOAPDeviceClass
* APMatrixDevice to APDevice
* APMatrixDeviceClass to APDeviceClass
* ap-matrix.c to ap.c (in hw/vfio)
* ap-matrix-device.c to ap-device.c (in hw/s390x)
* ap-matrix-device.h to ap-device.h (in include/hw/s390x)
* Added CPU model feature for AP facilities installed on guest and
facilities features for QCI Instructions Available (STFLE.12) and AP
Facilities Test facility installed (STFLE.15).
Tony Krowiak (5):
s390x/ap: base Adjunct Processor (AP) object
s390x/vfio: ap: VFIO: linux header updates
s390x/vfio: ap: Introduce VFIO AP device
s390x/cpumodel: Set up CPU model for AP device support
s390: doc: detailed specifications for AP virtualization
I'm going to highlight an issue that stems from bad HW design: The lack
of an AP interpretation facility (indication). We e.g. have something
like that for zPCI (and all other I/O besides AP as far as I remember).
Let's assume L1 provides AP to L2.
Let's assume L2 provides AP to L3.
L2 can blindly forward APs to L3 because it sees the AP facility. This
requires AP vSIE support. We have no separate way of indicating that
support, it comes with the AP feature. So let's assume L2 does not
emulate devices but uses interpretation for L3.
Everything is fine as long as L1 does not emulate AP
devices/instructions for L2. All instructions are interpreted by HW.
If L1 emulates AP, there is no need it sets any bit in the L2 SIE CRYCB.
In fact we better do not set any bit in the CRYCB.
But what happens if L1 emulates AP devices for L2? intepretation is
disabled. QEMU handles it.
However L2 can simply forward AP devices to L3. At this point, we must
also intercept and emulate AP instructions issued by L3 in _L1_.
If L2 forward devices to L3 through SIE ECA.28 but no bit is set is in
the CRYCB of L2,
L3 will not see any device.
This is what we call the nightmare of nested virtualization (see x86),
because we have to emulate L3 instructions in L1 - but even worse, not
even in L1 kernel space but in L1 user space.
As soon as one level begin to virtualize, all levels under it
must virtualize too so that L3 instructions will be handled in L2
which will issue instructions that will be handled in L1.
Long story short:
Making this scenario work would require a _huge_ effort (going to user
space with nested guest state - or communicating with the user space
part using some other mechanism).
A funny game with big overhead but same virtualization whatever the
So we could never provide the AP feature reliably with the SIE feature.
I think we should change a little this sentence to:
We can not provide SIE interpretation to a guest from which
any guest level N-1 does not use SIE interpretation.
Nothing bad will occur for the host, the hardware or other guests,
but the guest will just not get any device.
We want to avoid interdependence between CPU features. (because
everything else makes CPU feature detection ugly - CMMA is a good
example and the only exception so far)
Long story even shorter:
No emulated AP devices with KVM.
I agree with: KVM should never set bits in CRYCB for emulated devices.
Linux/KVM/QEMU in Böblingen - Germany