Lukas Straub <lukasstra...@web.de> writes:

> On Fri, 8 Aug 2025 11:37:23 -0400
> Peter Xu <pet...@redhat.com> wrote:
>
>> On Fri, Aug 08, 2025 at 10:55:25AM -0300, Fabiano Rosas wrote:
>> > Please work with Lukas to figure out whether yank can be used here. I
>> > think that's the correct approach. If the main loop is blocked, then
>> > some out-of-band cancellation routine is needed. migrate_cancel() could
>> > be it, but at the moment it's not. Yank is the second best thing.  
>> 
>> I agree.
>> 
>> migrate_cancel() should really be an OOB command..  It should be a superset
>> of yank features, plus anything migration speficic besides yanking the
>> channels, for example, when migration thread is blocked in PRE_SWITCHOVER.
>
> Hmm, I think the migration code should handle this properly even if the
> yank command is used. From the POV of migration, it sees that the
> connection broke with connection reset. That is the same error as if the
> other side crashes/is killed or a NAT/stateful firewall in between
> reboots.
>

That should all work just fine. After yank or after a detectable network
failure. The issue here seems to be that the destination recv is hanging
indefinitely. I don't think we ever played with socket timeout
configurations, or even switching to non-blocking during the sync. This
is actually (AFAIK) the first time we get a hang that's not "just" a
synchronization issue in the migration code.

>> 
>> I'll add this into my todo; maybe I can do something with it this release.
>> I'm happy if anyone would beat me to it.
>> 
>> > 
>> > The need for a timeout is usually indicative of a design issue. In this
>> > case, the choice of a coroutine for the incoming side is the obvious
>> > one. Peter will tell you all about it! =)  
>> 
>> Nah. :)
>> 

Reply via email to