Now that Focal is open I have opened proper Focal MP replacing the old one and also an Eoan SRU MP right away. => https://code.launchpad.net/~paelzer/ubuntu/+source/qemu/+git/qemu/+merge/374770 => https://code.launchpad.net/~paelzer/ubuntu/+source/qemu/+git/qemu/+merge/374771
** Description changed: + [Impact] + + * Due to a bug in qemu in 4.0 the config size for virtio-baloon changed. + * This breaks migration from pre 4.0 qemu because the PCI BAR size is + affected. + + * Upstream has realized this and fixed it in 4.1, this backports the fix + to qemu 4.0 in Ubuntu Eoan + + [Test Case] + + * Take a pre-eoan (pre qemu 4.0) guest and check that your setup can + migrate it back and forther with a eoan/qemu-4.0 target. + Then add a virt-baloon device to the guest on pre-4.0 and migrate it + again. + Unfixed the following error will show up: + get_pci_config_device: Bad config data: i=0x10 read: a1 device: 1 cmask: ff wmask: c0 w1cmask:0 + + * Unfixed -> Fixed qemu 4.0 migrations should work as well. While the + other way around it could (size didn't change), but there are no + guarantees (no logic in the target). + + [Regression Potential] + + * Messing with machine types is always dangerous, as in case of a mistake + things get even more complex. But in this case things seemed rather + straight forward. Pre 4.0 code all behaves the same, only 4.0 gets the + new attribute set and later code has logic to handle dynamic sizes. + That way I think we are safe of machine-type regressions. + * For the change in behavior, it changes pre 4.0 migrations, which atm + are broken if a virt-baloon device is present. There is nothing to + break more int hat use case, and if such a device isn't present it + shouldn't change anything. Therefore IMHO safe again. + + [Other Info] + + * n/a + + + --- + + Related but not the same as bug 1838569 which had two error signatures. The first being covered there and the second handled here. --- --- Quote from https://bugs.launchpad.net/cloud-archive/+bug/1838569/comments/4 Daniel 'f0o' Preussker (dpreussker) wrote 1 hour ago: #4 With recent release of OpenStack Train this issue reappears... Upgrading from Stein to Train will require all VMs to be hard-rebooted to be migrated as a final step because Live Migration fails with: Oct 17 10:28:43 h2.1.openstack.r0cket.net libvirtd[1545]: Unable to read from monitor: Connection reset by peer Oct 17 10:28:43 h2.1.openstack.r0cket.net libvirtd[1545]: internal error: qemu unexpectedly closed the monitor: 2019-10-17T10:28:42.981201Z qemu-system-x86_64: get_pci_config_device: Bad config data: i=0x10 read: a1 device: 1 cmask: ff wmask: c0 w1cmask:0 - 2019-10-17T10:28:42.981250Z qemu-system-x86_64: Failed to load PCIDevice:config - 2019-10-17T10:28:42.981263Z qemu-system-x86_64: Failed to load virtio-balloon:virtio - 2019-10-17T10:28:42.981272Z qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:05.0/virtio-balloon' - 2019-10-17T10:28:42.981391Z qemu-system-x86_64: warning: TSC frequency mismatch between VM (2532609 kHz) and host (2532608 kHz), and TSC scaling unavailable - 2019-10-17T10:28:42.983157Z qemu-system-x86_64: warning: TSC frequency mismatch between VM (2532609 kHz) and host (2532608 kHz), and TSC scaling unavailable - 2019-10-17T10:28:42.983672Z qemu-system-x86_64: load of migration failed: Invalid argument - + 2019-10-17T10:28:42.981250Z qemu-system-x86_64: Failed to load PCIDevice:config + 2019-10-17T10:28:42.981263Z qemu-system-x86_64: Failed to load virtio-balloon:virtio + 2019-10-17T10:28:42.981272Z qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:05.0/virtio-balloon' + 2019-10-17T10:28:42.981391Z qemu-system-x86_64: warning: TSC frequency mismatch between VM (2532609 kHz) and host (2532608 kHz), and TSC scaling unavailable + 2019-10-17T10:28:42.983157Z qemu-system-x86_64: warning: TSC frequency mismatch between VM (2532609 kHz) and host (2532608 kHz), and TSC scaling unavailable + 2019-10-17T10:28:42.983672Z qemu-system-x86_64: load of migration failed: Invalid argument --- --- - Identified as: Dr. David Alan Gilbert (dgilbert-h) wrote 1 hour ago: #5 Dnaiel: That's a different problem; 'Bad config data: i=0x10 read: a1 device: 1 cmask: ff wmask: c0 w1cmask:0'; so should probably be a separate bug. I'd bet on this being the one fixed by 2bbadb08ce272d65e1f78621002008b07d1e0f03 - --- --- And that is a fix that only is in qemu 4.1 and would be an open bug for Ubuntu and Cloud Archive ** Description changed: [Impact] - * Due to a bug in qemu in 4.0 the config size for virtio-baloon changed. - * This breaks migration from pre 4.0 qemu because the PCI BAR size is - affected. + * Due to a bug in qemu in 4.0 the config size for virtio-baloon changed. + * This breaks migration from pre 4.0 qemu because the PCI BAR size is + affected. - * Upstream has realized this and fixed it in 4.1, this backports the fix - to qemu 4.0 in Ubuntu Eoan + * Upstream has realized this and fixed it in 4.1, this backports the fix + to qemu 4.0 in Ubuntu Eoan [Test Case] - * Take a pre-eoan (pre qemu 4.0) guest and check that your setup can - migrate it back and forther with a eoan/qemu-4.0 target. - Then add a virt-baloon device to the guest on pre-4.0 and migrate it - again. - Unfixed the following error will show up: - get_pci_config_device: Bad config data: i=0x10 read: a1 device: 1 cmask: ff wmask: c0 w1cmask:0 + * Take a pre-eoan (pre qemu 4.0) guest and check that your setup can + migrate it back and forth with a eoan/qemu-4.0 target. + Note: (always) use a versioned machine type like pc-i44fx-disco (also + the default if you use disco as source). + Then add a virt-baloon device to the guest on pre-4.0 and migrate it + again. + Unfixed the following error will show up: + get_pci_config_device: Bad config data: i=0x10 read: a1 device: 1 cmask: ff wmask: c0 w1cmask:0 - * Unfixed -> Fixed qemu 4.0 migrations should work as well. While the - other way around it could (size didn't change), but there are no - guarantees (no logic in the target). + * Unfixed -> Fixed qemu 4.0 migrations should work as well. While the + other way around it could (size didn't change), but there are no + guarantees (no logic in the target). [Regression Potential] - * Messing with machine types is always dangerous, as in case of a mistake - things get even more complex. But in this case things seemed rather - straight forward. Pre 4.0 code all behaves the same, only 4.0 gets the - new attribute set and later code has logic to handle dynamic sizes. - That way I think we are safe of machine-type regressions. - * For the change in behavior, it changes pre 4.0 migrations, which atm - are broken if a virt-baloon device is present. There is nothing to - break more int hat use case, and if such a device isn't present it - shouldn't change anything. Therefore IMHO safe again. + * Messing with machine types is always dangerous, as in case of a mistake + things get even more complex. But in this case things seemed rather + straight forward. Pre 4.0 code all behaves the same, only 4.0 gets the + new attribute set and later code has logic to handle dynamic sizes. + That way I think we are safe of machine-type regressions. + * For the change in behavior, it changes pre 4.0 migrations, which atm + are broken if a virt-baloon device is present. There is nothing to + break more int hat use case, and if such a device isn't present it + shouldn't change anything. Therefore IMHO safe again. [Other Info] - - * n/a + * n/a --- - Related but not the same as bug 1838569 which had two error signatures. The first being covered there and the second handled here. --- --- Quote from https://bugs.launchpad.net/cloud-archive/+bug/1838569/comments/4 Daniel 'f0o' Preussker (dpreussker) wrote 1 hour ago: #4 With recent release of OpenStack Train this issue reappears... Upgrading from Stein to Train will require all VMs to be hard-rebooted to be migrated as a final step because Live Migration fails with: Oct 17 10:28:43 h2.1.openstack.r0cket.net libvirtd[1545]: Unable to read from monitor: Connection reset by peer Oct 17 10:28:43 h2.1.openstack.r0cket.net libvirtd[1545]: internal error: qemu unexpectedly closed the monitor: 2019-10-17T10:28:42.981201Z qemu-system-x86_64: get_pci_config_device: Bad config data: i=0x10 read: a1 device: 1 cmask: ff wmask: c0 w1cmask:0 2019-10-17T10:28:42.981250Z qemu-system-x86_64: Failed to load PCIDevice:config 2019-10-17T10:28:42.981263Z qemu-system-x86_64: Failed to load virtio-balloon:virtio 2019-10-17T10:28:42.981272Z qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:05.0/virtio-balloon' 2019-10-17T10:28:42.981391Z qemu-system-x86_64: warning: TSC frequency mismatch between VM (2532609 kHz) and host (2532608 kHz), and TSC scaling unavailable 2019-10-17T10:28:42.983157Z qemu-system-x86_64: warning: TSC frequency mismatch between VM (2532609 kHz) and host (2532608 kHz), and TSC scaling unavailable 2019-10-17T10:28:42.983672Z qemu-system-x86_64: load of migration failed: Invalid argument --- --- Identified as: Dr. David Alan Gilbert (dgilbert-h) wrote 1 hour ago: #5 Dnaiel: That's a different problem; 'Bad config data: i=0x10 read: a1 device: 1 cmask: ff wmask: c0 w1cmask:0'; so should probably be a separate bug. I'd bet on this being the one fixed by 2bbadb08ce272d65e1f78621002008b07d1e0f03 --- --- And that is a fix that only is in qemu 4.1 and would be an open bug for Ubuntu and Cloud Archive -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1848497 Title: virtio-balloon change breaks migration from qemu prior to 4.0 To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1848497/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
