On Thu, 28 May 2020 13:21:12 +0200 Cornelia Huck <coh...@redhat.com> wrote:
> On Fri, 22 May 2020 23:04:51 +0200 > Halil Pasic <pa...@linux.ibm.com> wrote: > > > On Wed, 20 May 2020 12:23:24 -0400 > > "Michael S. Tsirkin" <m...@redhat.com> wrote: [..] > > > So, how about this: switch iommu to on/off/auto. > > > > Many thanks for the reveiw, and sorry about the delay on my side. We > > have holidays here in Germany and I was not motivated enough up until > > now to check on my mails. > > > > > > I've actually played with the thought of switching iommu_platform to > > 'on/off/auto', but I didn't find an easy way to do it. I will look > > again. This would be the first property of this kind in QEMU, or? > > virtio-pci uses it for 'disable-legacy'. > Thank you very much! This makes tinging about 'on/off/auto' much easier. > > > > The 'on/off/auto' would be certainly much cleaner form user-interface > > perspective. The downsides are that it is more invasive, and more > > complicated. I'm afraid that it would also leave more possibilities for > > user error. > > To me, on/off/auto sounds like a reasonable thing to do. > > What possibilities of 'user error' do you see? I will whip up a prototype first and then come back to you with more details. The short answer is if the user isn't very careful about all the whistles and bells, I understand that the user will end up with a partially or fully non-PV-compatible VM. I had an internal bugreport where there was a nic generated by default that of course did not have iommu_platform='on'. > Shouldn't we fence off > misconfigurations, if the consequences would be disastrous? > I fully agree! This is unfortunately currently not the case. My patch takes the approach of avoiding miss-configuration in the first place, instead of sapping the user for it. > > > > > Add a property with a > > > reasonable name "allow protected"? If set allow switch to protected > > > memory and also set iommu auto to on by default. If not set then don't. > > > > > > > I think we have "allow protected" already expressed via cpu models. I'm > > also not sure how libvirt would react to the idea of a new machine > > property for this. You did mean "allow protected" as machine property, > > or? > > "Unpack facility in cpu model" means "guest may transition into pv > mode", right? What does it look like when the guest actually has > transitioned? Janosch has answered these. Will add my thoughts there. > > > > > AFAIU "allow protected" would be required for the !PV to PV switch, and > > we would have to reject paravirtualized devices with iommu_platform='off' > > on VM construction or hotplug (iommu_platform='auto/on' would be fine). > > > > Could you please confirm that I understood this correctly? > > > > > > > This will come handy for other things like migrating to hosts without > > > protected memory support. > > > > > > > This is already covered by cpu model AFAIK. > > I don't think we'd want to migrate between pv and non-pv anyway? > ditto [..] > > > > > > I don't really understand things fully but it looks like you are > > > changing features of a device. If so this bothers me, resets > > > happen at random times while driver is active, and we never > > > expect features to change. > > > > > > > Changing the device features is IMHO all right because the features can > > change only immediately after a system reset and before the first vCPU > > is run. That is ensured by two facts. > > > > > > First, the feature can only change when ms->pv changes. That is on the > > first reset after the VM entered or left the "protected virtualization" > > mode of operation. And that switch requires a system reset. Because the > > PV switch is initiated by the guest, and the guest is rebooted as a > > consequence, the guest will never observe the change in features. > > This really needs more comments, as it is not obvious to the casual > reader. (I also stumbled over the resets.) Sorry, where exactly would you like to have those extra comments? > > But I wonder whether we are actually missing those subsystems resets > today? > If I have to settle for yes or no, my answer is no. We need at least one subsystem reset during the conversion. Without my patch applied things look like this $ git grep -p -B 5 -e subsystem_reset HEAD~1 -- hw/s390x/s390-virtio-ccw.c HEAD~1:hw/s390x/s390-virtio-ccw.c=static const char *const reset_dev_types[] = { -- HEAD~1:hw/s390x/s390-virtio-ccw.c- "s390-sclp-event-facility", HEAD~1:hw/s390x/s390-virtio-ccw.c- "s390-flic", HEAD~1:hw/s390x/s390-virtio-ccw.c- "diag288", HEAD~1:hw/s390x/s390-virtio-ccw.c-}; HEAD~1:hw/s390x/s390-virtio-ccw.c- HEAD~1:hw/s390x/s390-virtio-ccw.c:static void subsystem_reset(void) -- HEAD~1:hw/s390x/s390-virtio-ccw.c=static void s390_machine_reset(MachineState *machine) -- HEAD~1:hw/s390x/s390-virtio-ccw.c- case S390_RESET_MODIFIED_CLEAR: HEAD~1:hw/s390x/s390-virtio-ccw.c- /* HEAD~1:hw/s390x/s390-virtio-ccw.c- * Susbsystem reset needs to be done before we unshare memory HEAD~1:hw/s390x/s390-virtio-ccw.c- * and lose access to VIRTIO structures in guest memory. HEAD~1:hw/s390x/s390-virtio-ccw.c- */ HEAD~1:hw/s390x/s390-virtio-ccw.c: subsystem_reset(); -- HEAD~1:hw/s390x/s390-virtio-ccw.c- case S390_RESET_LOAD_NORMAL: HEAD~1:hw/s390x/s390-virtio-ccw.c- /* HEAD~1:hw/s390x/s390-virtio-ccw.c- * Susbsystem reset needs to be done before we unshare memory HEAD~1:hw/s390x/s390-virtio-ccw.c- * and lose access to VIRTIO structures in guest memory. HEAD~1:hw/s390x/s390-virtio-ccw.c- */ HEAD~1:hw/s390x/s390-virtio-ccw.c: subsystem_reset(); -- HEAD~1:hw/s390x/s390-virtio-ccw.c- } HEAD~1:hw/s390x/s390-virtio-ccw.c- run_on_cpu(cs, s390_do_cpu_initial_reset, RUN_ON_CPU_NULL); HEAD~1:hw/s390x/s390-virtio-ccw.c- run_on_cpu(cs, s390_do_cpu_load_normal, RUN_ON_CPU_NULL); HEAD~1:hw/s390x/s390-virtio-ccw.c- break; HEAD~1:hw/s390x/s390-virtio-ccw.c- case S390_RESET_PV: /* Subcode 10 */ HEAD~1:hw/s390x/s390-virtio-ccw.c: subsystem_reset(); That is except for hw/s390x/s390-virtio-ccw.c- case S390_RESET_EXTERNAL: hw/s390x/s390-virtio-ccw.c- case S390_RESET_REIPL: hw/s390x/s390-virtio-ccw.c- if (s390_is_pv()) { hw/s390x/s390-virtio-ccw.c- s390_machine_unprotect(ms); hw/s390x/s390-virtio-ccw.c- } hw/s390x/s390-virtio-ccw.c- hw/s390x/s390-virtio-ccw.c- qemu_devices_reset(); Which does a qemu_devices_reset(), we already have a subsystem_reset(), but for the cases with a PV transition this reset happens before mc->pv is changed, so I can't react properly in the callback. For my purposes the qemu_devices_reset() is sufficient, but I'm not sure. The qemu_devices_reset() seems to come form db3b2566e0 ("s390x: machine reset function with new ipl cpu handling") authored by David and reviewed by you. The subsystem reset from 4e872a3fb0 ("s390: provide I/O subsystem reset") authored by Christian. From I quick look, I believe what is done by subsystem_reset() should be a real subset of what is done by qemu_devices_reset(). Maybe the subsystem_reset() can be just moved and the extra subsystem_reset() calls added by me can be removed. I didn't look into that, because it would have been wasted effort if the community rejects this approach. I hope this answers your questions! Thanks for having a look! Regards, Halil