On Wed, Apr 03, 2024 at 04:35:35PM +0800, Wang, Lei wrote: > We should change the following line from > > while (!qemu_sem_timedwait(&mis->postcopy_qemufile_dst_done, 100)) { > > to > > while (qemu_sem_timedwait(&mis->postcopy_qemufile_dst_done, 100)) {
Stupid me.. :( Thanks for figuring this out. > > After that fix, test passed and no segfault. > > Given that the test shows a yield to the main loop won't introduce much > overhead > (<1ms), how about first yield unconditionally, then we enter the while loop to > wait for several ms and yield periodically? Shouldn't the expectation be that this should return immediately without a wait? We're already processing LISTEN command, and on the source as you said it was much after the connect(). It won't guarantee the ordering but IIUC the majority should still have a direct hit? What we can do though is reducing the 100ms timeout if you see that's perhaps a risk of having too large a downtime when by accident. We can even do it in a tight loop here considering downtime is important, but to provide an intermediate ground: how about 100ms -> 1ms poll? If you agree (and also to Wei; please review this and comment if there's any!), would you write up the commit log, fully test it in whatever way you could, and resend as a formal patch (please do this before Friday if possible)? You can keep a "Suggested-by:" for me. I want to queue it for rc3 if it can catch it. It seems important if Wei can always reproduce it. Thanks, -- Peter Xu