Re: [PATCH] s390x: Clear RAM on diag308 subcode 3 reset

David Hildenbrand (Red Hat) Tue, 11 Nov 2025 07:50:13 -0800

On 11.11.25 15:55, Christian Borntraeger wrote:


Am 11.11.25 um 14:37 schrieb David Hildenbrand (Red Hat):

        /*
         * Temporarily drop the record/replay mutex to let rr_cpu_thread_fn()
@@ -479,6 +480,7 @@ static void s390_machine_reset(MachineState *machine,
ResetType type)
        switch (reset_type) {
        case S390_RESET_EXTERNAL:
        case S390_RESET_REIPL:
+    case S390_RESET_REIPL_CLEAR:
            /*
             * Reset the subsystem which includes a AP reset. If a PV
             * guest had APQNs attached the AP reset is a prerequisite to
@@ -489,6 +491,10 @@ static void s390_machine_reset(MachineState *machine,
ResetType type)
                s390_machine_unprotect(ms);
            }
+        if (reset_type == S390_RESET_REIPL_CLEAR) {
+            ram_block_discard_range(rb, 0 , qemu_ram_get_used_length(rb));
+        }
+

...




Do I see that right that this patch never made it into qemu master? IIRC
Matt has clarified all concerns?


I was hoping to see a reply from David that he's fine with the patch now...
David?


Staring at this again, one more point regarding userfaultfd: doing the discard 
on the destination while postcopy is active might be problematic.

I don't remember all details, but I think that if we have the following:

1) Migrate page X to dst
2) Discard page X on dst
3) Access page X on dst

that postcopy_request_page()->migrate_send_rp_req_pages() would assume that the 
page was already transferred (marked received in the receive bitmap during 1) ) 
and essentially never place a fresh zeropage during 3) to be stuck forever.

Can we have a postcopy running while we are in system reset?


Yes, that should be possible. Start postcopy and then trigger a system reset on 
the
destination (e.g., from the guest).

Or as an alternative can we check for postcopy running and not discard in that 
case.


Another interaction might be with background snapshots (another form of 
migration)
running concurrently. If we discard after populating all memory+registering
userfaultfd-wp I think we might not get write events for all changes,
possibly corrupting the snapshot (not 100% sure but that's what I remember).


What virtio-mem does to workaround all that is the following:

static bool virtio_mem_is_busy(void)
{
    /*
     * Postcopy cannot handle concurrent discards and we don't want to migrate
     * pages on-demand with stale content when plugging new blocks.
     *
     * For precopy, we don't want unplugged blocks in our migration stream, and
     * when plugging new blocks, the page content might differ between source
     * and destination (observable by the guest when not initializing pages
     * after plugging them) until we're running on the destination (as we didn't
     * migrate these blocks when they were unplugged).
     */
    return migration_in_incoming_postcopy() || migration_is_running();
}



--
Cheers

David

Re: [PATCH] s390x: Clear RAM on diag308 subcode 3 reset

Reply via email to