On 11.11.25 15:55, Christian Borntraeger wrote:
Am 11.11.25 um 14:37 schrieb David Hildenbrand (Red Hat):
/*
* Temporarily drop the record/replay mutex to let rr_cpu_thread_fn()
@@ -479,6 +480,7 @@ static void s390_machine_reset(MachineState *machine,
ResetType type)
switch (reset_type) {
case S390_RESET_EXTERNAL:
case S390_RESET_REIPL:
+ case S390_RESET_REIPL_CLEAR:
/*
* Reset the subsystem which includes a AP reset. If a PV
* guest had APQNs attached the AP reset is a prerequisite to
@@ -489,6 +491,10 @@ static void s390_machine_reset(MachineState *machine,
ResetType type)
s390_machine_unprotect(ms);
}
+ if (reset_type == S390_RESET_REIPL_CLEAR) {
+ ram_block_discard_range(rb, 0 , qemu_ram_get_used_length(rb));
+ }
+
...
Do I see that right that this patch never made it into qemu master? IIRC
Matt has clarified all concerns?
I was hoping to see a reply from David that he's fine with the patch now...
David?
Staring at this again, one more point regarding userfaultfd: doing the discard
on the destination while postcopy is active might be problematic.
I don't remember all details, but I think that if we have the following:
1) Migrate page X to dst
2) Discard page X on dst
3) Access page X on dst
that postcopy_request_page()->migrate_send_rp_req_pages() would assume that the
page was already transferred (marked received in the receive bitmap during 1) )
and essentially never place a fresh zeropage during 3) to be stuck forever.
Can we have a postcopy running while we are in system reset?
Yes, that should be possible. Start postcopy and then trigger a system reset on
the
destination (e.g., from the guest).
Or as an alternative can we check for postcopy running and not discard in that
case.
Another interaction might be with background snapshots (another form of
migration)
running concurrently. If we discard after populating all memory+registering
userfaultfd-wp I think we might not get write events for all changes,
possibly corrupting the snapshot (not 100% sure but that's what I remember).
What virtio-mem does to workaround all that is the following:
static bool virtio_mem_is_busy(void)
{
/*
* Postcopy cannot handle concurrent discards and we don't want to migrate
* pages on-demand with stale content when plugging new blocks.
*
* For precopy, we don't want unplugged blocks in our migration stream, and
* when plugging new blocks, the page content might differ between source
* and destination (observable by the guest when not initializing pages
* after plugging them) until we're running on the destination (as we didn't
* migrate these blocks when they were unplugged).
*/
return migration_in_incoming_postcopy() || migration_is_running();
}
--
Cheers
David