On Wed, 2023-04-12 at 23:01 +0200, Juan Quintela wrote: > Nina Schoetterl-Glausch <n...@linux.ibm.com> wrote: > > Hi, > > > > We're seeing failures running s390x migration kvm-unit-tests tests with TCG. > > As this is tcg, could you tell the exact command that you are running? > Does it needs to be in s390x host, rigth?
I've just tried with a cross compile of kvm-unit-tests and that fails, too. git clone https://gitlab.com/kvm-unit-tests/kvm-unit-tests.git cd kvm-unit-tests/ ./configure --cross-prefix=s390x-linux-gnu- --arch=s390x make for i in {0..30}; do echo $i; QEMU=../qemu/build/qemu-system-s390x ACCEL=tcg ./run_tests.sh migration-skey-sequential | grep FAIL && break; done > > $ time ./tests/qtest/migration-test I haven't looked if that test fails at all, we just noticed it with the kvm-unit-tests. > # random seed: R02S940c4f22abc48b14868566639d3d6c77 > # Skipping test: s390x host with KVM is required > 1..0 > > real 0m0.003s > user 0m0.002s > sys 0m0.001s > > > > Some initial findings: > > What seems to be happening is that after migration a control block > > header accessed by the test code is all zeros which causes an > > unexpected exception. > > What exception? > > What do you mean here by control block header? It's all s390x test guest specific stuff, I don't expect it to be too helpful. The guest gets a specification exception program interrupt while executing a SERVC because the SCCB control block is invalid. See https://gitlab.com/qemu-project/qemu/-/issues/1565 for a code snippet. The guest sets a bunch of fields in the SCCB header, but when TCG emulates the SERVC, they are zero which doesn't make sense. > > > I did a bisection which points to c8df4a7aef ("migration: Split > > save_live_pending() into state_pending_*") as the culprit. > > The migration issue persists after applying the fix e264705012 ("migration: > > I messed state_pending_exact/estimate") on top of c8df4a7aef. > > > > Applying > > > > diff --git a/migration/ram.c b/migration/ram.c > > index 56ff9cd29d..2dc546cf28 100644 > > --- a/migration/ram.c > > +++ b/migration/ram.c > > @@ -3437,7 +3437,7 @@ static void ram_state_pending_exact(void *opaque, > > uint64_t max_size, > > > > uint64_t remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE; > > > > - if (!migration_in_postcopy()) { > > + if (!migration_in_postcopy() && remaining_size < max_size) { > > If block is all zeros, then remaining_size should be zero, so always > smaller than max_size. > > I don't really fully understand what is going here. > > > qemu_mutex_lock_iothread(); > > WITH_RCU_READ_LOCK_GUARD() { > > migration_bitmap_sync_precopy(rs); > > > > on top fixes or hides the issue. (The comparison was removed by c8df4a7aef.) > > I arrived at this by experimentation, I haven't looked into why this makes > > a difference. > > > > Any thoughts on the matter appreciated. > > Later, Juan. >