Lukas Straub <lukasstra...@web.de> writes: > On Fri, 8 Aug 2025 11:37:23 -0400 > Peter Xu <pet...@redhat.com> wrote: > >> On Fri, Aug 08, 2025 at 10:55:25AM -0300, Fabiano Rosas wrote: >> > Please work with Lukas to figure out whether yank can be used here. I >> > think that's the correct approach. If the main loop is blocked, then >> > some out-of-band cancellation routine is needed. migrate_cancel() could >> > be it, but at the moment it's not. Yank is the second best thing. >> >> I agree. >> >> migrate_cancel() should really be an OOB command.. It should be a superset >> of yank features, plus anything migration speficic besides yanking the >> channels, for example, when migration thread is blocked in PRE_SWITCHOVER. > > Hmm, I think the migration code should handle this properly even if the > yank command is used. From the POV of migration, it sees that the > connection broke with connection reset. That is the same error as if the > other side crashes/is killed or a NAT/stateful firewall in between > reboots. >
That should all work just fine. After yank or after a detectable network failure. The issue here seems to be that the destination recv is hanging indefinitely. I don't think we ever played with socket timeout configurations, or even switching to non-blocking during the sync. This is actually (AFAIK) the first time we get a hang that's not "just" a synchronization issue in the migration code. >> >> I'll add this into my todo; maybe I can do something with it this release. >> I'm happy if anyone would beat me to it. >> >> > >> > The need for a timeout is usually indicative of a design issue. In this >> > case, the choice of a coroutine for the incoming side is the obvious >> > one. Peter will tell you all about it! =) >> >> Nah. :) >>