On 20.02.2017 14:46, Paolo Bonzini wrote: > > > On 16/02/2017 15:51, Janosch Frank wrote: >> While trying to fix a bug in the s390 migration code, I noticed that >> QEMU ignores practically all errors returned from that VM ioctl. QEMU >> behaves as specified in the KVM api and only processes -1 (-EPERM) as an >> error. >> >> Unfortunately the documentation is wrong/old and KVM may return -EFAULT, >> -EINVAL, -ENOTSUPP (BookE) and -ENOENT. This bugs me, as I found a case >> where I want to return -EFAULT because of guest memory problems and QEMU >> will still happily migrate the VM. > > Guest memory problems should not return EFAULT, which corresponds to a > wrong address passed to KVM_GET_DIRTY_LOG. In fact, EFAULT is probably > the only case where an assertion is warranted---just like you passed a > wrong pointer to KVM_GET_DIRTY_LOG, who knows who else is going to get > that pointer. > > ENOENT and EINVAL should not kill the source guest, though they should > terminate migration. But then I would like to know more about this > case, because they should never happen unless KVMMemoryListener is buggy.
It is currently possible to start a hugetlbfs guest on s390 although we don't have any huge page support. When QEMU starts the VM, it will get a lot of errors back and pause the VM. When this VM is then migrated, the host will do pte dirty handling on huge pages in kvm_s390_sync_dirty_log/test_and_clear_guest_dirty Running into such a huge page would be a guest memory error, so EINVAL it is. I'll post the patches in a bit to give a bit more context. > > Paolo > >> I currently don't see a reason why we continue to migrate on EFAULT and >> EINVAL. But returning -error from kvm_physical_sync_dirty_bitmap might >> also a bit hard, as it kills QEMU. >> >> Do we want to fix this and if, how do we want it done? >> If not we at least have a definitive mail to point to when the next one >> comes around. I also have a KVM patch to update the api documentation if >> wanted (maybe we should dust that off a bit anyhow). >> >> >> This has been brought up in 2009 [1] the first time and was more or less >> fixed and then reverted in 2014 [2]. >> >> The reason in [1] was that PPC hadn't settled yet on a valid return code. >> >> In [2] it was too close to the v2 to handle it properly. >> >> >> [1] https://lists.nongnu.org/archive/html/qemu-devel/2009-07/msg01772.html >> >> [2] https://lists.nongnu.org/archive/html/qemu-devel/2014-04/msg01993.html >> >> >> Cheers, >> Janosch >> >