On Thu, Dec 21, 2017 at 11:12:06AM +1100, Benjamin Herrenschmidt wrote: > On Wed, 2017-12-20 at 16:09 +1100, David Gibson wrote: > > > > As you've suggested in yourself, I think we might need to more > > explicitly model the different components of the XIVE system. As part > > of that, I think you need to be clearer in this base skeleton about > > exactly what component your XIVE object represents. > > > > If the answer is "the overall thing" I suspect that's not what you > > want - I had one of those for XICs which proved to be a mistake > > (eventually replaced by the XICSFabric interface). > > > > Changing the model later isn't impossible, but doing so without > > breaking migration can be a real pain, so I think it's worth a > > reasonable effort to try and get it right initially. > > Note: we do need to speed things up a bit, as having exploitation mode > in KVM will significantly help with IPI performance among other things. > > I'm about ready to do the KVM bits. The one thing we need to discuss > and figure a good design for is how we map all those interrupt control > pages into qemu. > > Each interrupt (either PCIe pass-through or the "generic XIVE IPIs" > which are used for guest IPIs and for vio/virtio/emulated interrupts) > comes with a "control page" (ESB page) which needs to be mapped into > the guest, and the generic IPIs also come with a trigger page which > needs to be mapped into the guest for guest IPIs or OpenCAPI > interrupts, or just qemu for emulated devices. > > Now that can be thousands of these critters. I certainly don't want to > create thousands of VMAs in qemu and even less thousands of memory > regions in KVM. > > So we need some kind of mechanism by wich a single large VMA gets > mmap'ed into qemu (or maybe a couple of these, but not too many) and > the interrupt pages can be assigned to slots in there and demand > faulted.
Ok, I see your point. We'll definitely need to be able to map things in as a block, rather than one by one. > For the generic interrupts, this can probably be covered by KVM, adding > some arch ioctls for allocating IPIs and mmap'ing that region etc... > > For pass-through, it's trickier, we don't want to mmap each irqfd > individually for the above reason, so we want to "link" them to KVM. We > don't want to allow qemu to take control of any arbitrary interrupt in > the system though, so it has to related to the ownership of the irqfd > coming from vfio. > > OpenCAPI I suspect will be its own can of worms... > > Also, have we decided how the process of switching between XICS and > XIVE will work vs. CAS ? And how that will interact with KVM ? I was > thinking the kernel would implement a different KVM device type, ie > the "emulated XICS" would remain KVM_DEV_TYPE_XICS and XIVE would be > KVM_DEV_TYPE_XIVE. > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature