On Fri, Jun 07, 2019 at 11:49:50AM +0200, Greg Kurz wrote: 65;5603;1c> On Fri, 7 Jun 2019 10:17:58 +0200 > Cédric Le Goater <c...@kaod.org> wrote: > > > On 07/06/2019 02:19, David Gibson wrote: > > > On Thu, Jun 06, 2019 at 07:08:59PM +0200, Greg Kurz wrote: > > >> If KVM is too old to support XIVE native exploitation mode, we might end > > >> up using the emulated XIVE after CAS. This is sub-optimal if KVM > > >> in-kernel > > >> XICS is available, which is the case most of the time. > > > > > > This is intentional. A predictable guest environment trumps performance. > > > > > > > I don't agree. > > > > If the user does not specify any specific interrupt mode, we should favor > > the faster one. > > > > This all boils down to the semantics of "dual"... "XICS AND XIVE" or "any > of XICS or XIVE" ?
It means xics and xive are available, guest gets to choose. > We already have a precedent when using pre-power9 compat mode. If the > user passes ic-mode=dual,max-compat-cpu=power8, we internally convert > "dual" to act as "xics". The rationale is that a guest running in > power8 architected mode isn't supposed to know about XIVE at all. > > Should we forbid this config and exit QEMU instead ? Uh.. yes, that would make sense, actually. However max-compat-cpu=power8 without an explicit ic-mode=xive or ic-mode=dual should act as ic-mode=xics. The difference here is that it's an explicitly user specified constraint, rather than a host implied constraint. > > Here is the current matrix (with this patch) for guests running on an > > old KVM, that is without KVM XIVE support. Let's discuss on what we > > want. > > > > kernel_irqchip > > > > (default) > > ic-mode allowed off on > > > > dual XICS KVM XICS emul.(3) XICS KVM (default mode) > > xics XICS KVM XICS emul. XICS KVM > > xive XIVE emul.(1) XIVE emul. QEMU failure (2) > > > > > > (1) QEMU warns with "warning: kernel_irqchip requested but unavailable: > > IRQ_XIVE capability must be present for KVM" > > (2) QEMU fails with "kernel_irqchip requested but unavailable: > > IRQ_XIVE capability must be present for KVM" > > (3) That is wrong I think, we should get XIVE emulated. > > > > This is the logical consequence of turning "dual" into "xics". > > > > > what you would want is XIVE emulation when ic-mode=dual and > > kernel_irqchip=allowed, which is the behavior with this patch (but there > > I guess you mean s/with/without > > > are reboot bugs) > > > > Yeah. If the semantics of "dual" is to always advertise XICS AND XIVE, and we > keep the current fallback behavior, then we need each implementation to have > proper init/teardown support as well as proper rollback on error paths. > > This is definitely not the case: rollback is missing in both in-kernel > XICS and XIVE code and the emulated XICS and XIVE don't have teardown. > > This can generate a variety of bugs, including QEMU crashes... the old KVM > case > is just one of them, but there might be others. > > > > > >> Also, an old KVM may not allow to destroy and re-create the KVM XICS, > > >> which > > >> is precisely what "dual" does during machine reset. This causes QEMU to > > >> try > > >> to switch to emulated XICS and to crash because RTAS call de-registration > > >> isn't handled correctly. We could possibly fix that, but again we would > > >> still end up with an emulated XICS or XIVE. > > > > > > Ugh, that's a problem. > > > > Yes. It's another problem around the way we cleanup the allocated resources. > > It should be another patch. > > > > Yeah, probably many other patches... > > > > > > >> "dual" is definitely not a good choice with older KVMs. Internally force > > >> XICS when we detect this. > > > > > > But this is not an acceptable solution. Silently changing the guest > > > visible environment based on host capabilities is never ok. > > > > If the host (KVM) doesn't have a capability, what is the point of trying > > to use it if we can do better. I know you are considering KVM/QEMU as a > > whole but who would run with kernel_irqchip=off ? > > > > The real problem is when you don't pass kernel_irqchip at all. Combined > with "dual", this gives a fair number of cases. Do we want to support > all possible transitions ? > > > > We must > > > either give the guest environment that the user has requested, or fail > > > outright. > > > > 'dual' mode means both and the user is not requesting XIVE. We are changing > > the priority of choices from : > > > > 1. KVM XIVE > > 2. QEMU XIVE > > 3. KVM XICS > > 4. QEMU XICS > > > > to: > > > > 1. KVM XIVE > > 2. KVM XICS > > 3. QEMU XIVE > > 4. QEMU XICS > > > > which is better I think. > > > > Using KVM XICS is indeed better than QEMU XIVE... but IIUC what David > is saying, kernel-irqchip semantics are: > > - on: user favors performance, at the expense of a reduced spectrum > of usable hosts > > - absent: user favors being able to start the VM on a wider spectrum > of hosts, at the possible expense of performance Correct. > - off: user wants the VM to start on any host, doesn't care for > performance at all Well, TBH, the main use of kernel-irqchip=off is for debugging and testing. > > So the only way to have 1. KVM XIVE 2.KVM XICS would be to pass "on". > > > C. > > > > > > > > > >> > > >> Signed-off-by: Greg Kurz <gr...@kaod.org> > > >> --- > > >> hw/ppc/spapr_irq.c | 10 ++++++++++ > > >> 1 file changed, 10 insertions(+) > > >> > > >> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c > > >> index 3156daf09381..d788bd662a7a 100644 > > >> --- a/hw/ppc/spapr_irq.c > > >> +++ b/hw/ppc/spapr_irq.c > > >> @@ -18,6 +18,7 @@ > > >> #include "hw/ppc/xics_spapr.h" > > >> #include "cpu-models.h" > > >> #include "sysemu/kvm.h" > > >> +#include "kvm_ppc.h" > > >> > > >> #include "trace.h" > > >> > > >> @@ -668,6 +669,15 @@ static void spapr_irq_check(SpaprMachineState > > >> *spapr, Error **errp) > > >> return; > > >> } > > >> } > > >> + > > >> + /* > > >> + * KVM may be too old to support XIVE, in which case we'd rather try > > >> + * to use the in-kernel XICS instead of the emulated XIVE. > > >> + */ > > >> + if (kvm_enabled() && !kvmppc_has_cap_xive() && > > >> + spapr->irq == &spapr_irq_dual) { > > >> + spapr->irq = &spapr_irq_xics; > > >> + } > > >> } > > >> > > >> /* > > >> > > > > > > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature