On Mon, Nov 06, 2023 at 10:03:17PM +0000, “William Roche wrote: > From: William Roche <william.ro...@oracle.com> > > > Note about ARM specificities: > This code has a small part impacting more specificaly ARM machines, > that's the reason why I added qemu-...@nongnu.org -- see description. > > > A Qemu VM can survive a memory error, as qemu can relay the error to the > VM kernel which could also deal with it -- poisoning/off-lining the impacted > page. > This situation creates a hole in the VM memory address space that the VM > kernel > knows about (an unreadable page or set of pages). > > But the migration of this VM (live migration through the network or > pseudo-migration with the creation of a state file) will crash Qemu when > it sequentially reads the memory address space and stumbles on the > existing hole. > > In order to thoroughly correct this problem, the poison information should > follow the migration which represents several difficulties: > - poisoning a page on the destination machine to replicate the source > poison requires CAP_SYS_ADMIN priviledges, and qemu process may not > always run as a root process > - the destination kernel needs to be configured with CONFIG_MEMORY_FAILURE > - the poison information would require a memory transfer protocol > enhancement to provide this information > (The current patches don't provide any of that) > > But if we rely on the fact that the a running VM kernel is correctly > dealing with memory poison it is informed about: marking the poison page > as inaccessible, we could count on the VM kernel to make sure that > poisoned pages are not used, even after a migration. > In this case, I suggest to treat the poisoned pages as if they were > zero-pages for the migration copy. > This fix also works with underlying large pages, taking into account the > RAMBlock segment "page-size". > > Now, it leaves a case that we have to deal with: if a memory error is > reported to qemu but not injected into the running kernel... > As the migration will go from a poisoned page to an all-zero page, if > the VM kernel doesn't prevent the access to this page, a memory read > that would generate a BUS_MCEERR_AR error on the source platform, could > be reading zeros on the destination. This is a memory corruption. > > So we have to ensure that all poisoned pages we set to zero are known by > the running kernel. But we have a problem with platforms where BUS_MCEERR_AO > errors are ignored, which means that qemu knows about the poison but the VM > doesn't. For the moment it's only the case for ARM, but could later be > also needed for AMD VMs. > See https://lore.kernel.org/all/20230912211824.90952-3-john.al...@amd.com/ > > In order to avoid this possible silent data corruption situation, we should > prevent the migration when we know that a poisoned page is ignored from the > VM. > > Which is, according to me, the smallest fix we need to avoid qemu crashes > on migration after an handled memory error, without introducing a possible > corruption situation. > > This fix is scripts/checkpatch.pl clean. > Unit test: Migration blocking succesfully tested on ARM -- injected AO error > blocks it. On x86 the same type of error being relayed doesn't block. > > v2: > - adding compressed transfer handling of poisoned pages > > v3: > - Included the Reviewed-by and Tested-by information on first patch > - added a TODO comment above control_save_page() > mentioning Zhijian's feedback about RDMA migration failure. > > v4: > - adding a patch to deal with unknown poison tracking (impacting ARM) > (not using migrate_add_blocker as this is not devices related and > we want to avoid the interaction with --only-migratable mechanism) > > v5: > - Updating the code to the latest version > - adding qemu-...@nongnu.org for a complementary review > > > William Roche (2): > migration: skip poisoned memory pages on "ram saving" phase > migration: prevent migration when a poisoned page is unknown from the > VM
I hope someone from arch-specific can have a quick look at patch 2.. One thing to mention is unfortunately waiting on patch 2 means we'll miss this release. Actually it is already missed.. softfreeze yesterday [1]. So it may likely need to wait for 9.0. [1] https://wiki.qemu.org/Planning/8.2 -- Peter Xu