Am 26.05.23 um 15:47 schrieb Fiona Ebner: > Am 26.05.23 um 12:16 schrieb Juan Quintela: >> Nak >> >> Sometimes it works, and sometimes it hangs. > > Sorry, I originally only ran the tests for x86_64 (native for me). I now > ran into the hang too, with qtest-aarch64/migration-test and > qtest-i386/migration-test. > >> Can you take a look? > > Will do! >
So I took a look at the multifd_send_state->params[$i] and noticed that the IOChannel c is still NULL and running is still false, while the name, page_size, etc. have already been initialized. So it seems like that socket_send_channel_create() did not manage to execute multifd_new_send_channel_async() yet, telling from the following in multifd_save_setup(): > p->page_size = qemu_target_page_size(); > p->page_count = page_count; > > if (migrate_zero_copy_send()) { > p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZERO_COPY; > } else { > p->write_flags = 0; > } > > socket_send_channel_create(multifd_new_send_channel_async, p); > } I guess the execution of multifd_new_send_channel_async() after socket_send_channel_create() somehow depends on the main thread doing something? But the main thread is waiting on the BQL. Should I introduce an unlocked section around multifd_send_sync_main() in ram_save_setup() and hope for the best? The tests ran two times without errors for me afterwards. Is there an easy way to only run the single problematic test cases again? There's still the risk there's something else that needs an unlocked section. In that regard, v1 of the patch is safer, because it doesn't change which sections are inside or outside the BQL for migration, just for snapshot. Best Regards, Fiona