On Fri, Jan 31, 2025 at 02:42:41PM +0100, Juraj Marcin wrote: > On 2025-01-30 16:04, Peter Xu wrote: > > On Thu, Jan 30, 2025 at 05:11:36PM +0100, Juraj Marcin wrote: > > > When there are no page requests from the destination side and the > > > connection breaks, the destination might not be aware of it. This is > > > the case for example with a TCP connection, which by default remains > > > open unless it is explicitly closed or the TCP stack fails to > > > successfully send a packet. > > > > > > Such situation without any traffic from the destination to the source > > > side can happen for multiple reasons. For example, if all pages accessed > > > at idle are already migrated, but there are still non-migrated pages > > > that are accessed very infrequently, or if the destination QEMU is > > > started with '-S' option and the management application does not run > > > the 'cont' QMP command right away. > > > > > > Due to this, the destination side might not know about the network > > > failure and stays in the postcopy-active state instead of switching > > > to the postcopy-paused state, informing the destination management > > > application about the issue. > > > > > > This patch resolves this by sending a keepalive message on the return > > > path to the source if there is no page fault (thus page request message) > > > in a certain amount of time. This wait time can be configured with new > > > 'postcopy-rp-keepalive-period' migration parameter, by default it is > > > 60000 ms. By setting this parameter to 0, sending keepalive messages > > > can be disabled completely. > > > > Hmm, I see.. Does it mean it would also work if we leverage tcp keepalive > > events? I just checked on my system, indeed it can at least be too large a > > default value: > > > > $ sysctl net.ipv4.tcp_keepalive_time > > net.ipv4.tcp_keepalive_time = 7200 > > > > I wonder if it makes more sense to set it to a smaller value. What you > > proposed here (60s) as default sounds more reasonable indeed at least to > > me. > > > > Is that a concern (e.g. tcp keepalive not enabled, or timeout too large) so > > that maybe it's better we have our own way to timeout the session? If so, > > maybe worthwhile to add some comment on that. > > Accepting or relaying TCP keepalive packets is not required by the RFC > and is not enabled on sockets used (at least on the destination side, > which I cheked). Therefore, I opted to implement it at higher layer, > migration protocol itself. I can mention it the commit message. > > However, even with 60s keepalive period, the time it takes for the TCP > stack to give up trying to send a packet can be around 15-20 minutes. > This could be possibly configured at socket level with > 'TCP_USER_TIMEOUT' socket option.
I see it's defined as: #define TCP_USER_TIMEOUT 18 /* How long for loss retry before timeout */ It doesn't look like to be the knob for the keep alive? Since at it, I did check the relevant knobs for keep alive, and looks like they do exist, and they can also be configured per-socket (and IIUC it shouldn't be affected by the global configuration). Then.. instead of this new migration packet, how about we configure tcp keep alive on migration sockets? We may also not need a compat property entry. IIUC, we need to tune below knobs: #define TCP_KEEPIDLE 4 /* Start keeplives after this period */ #define TCP_KEEPINTVL 5 /* Interval between keepalives */ #define TCP_KEEPCNT 6 /* Number of keepalives before death */ So maybe we can setup SO_KEEPALIVE to 1 to enable keepalive, then adjust above three knobs to make sure it HUPs properly when detecting network down? [...] > > > @@ -2399,6 +2407,10 @@ static void *source_return_path_thread(void > > > *opaque) > > > trace_source_return_path_thread_switchover_acked(); > > > break; > > > > > > + case MIG_RP_MSG_KEEPALIVE: > > > + trace_source_return_path_thread_received_keepalive(); > > > + break; > > > + > > > default: > > > break; > > > > So we ignore unknown messages on return path. I didn't expect it, but > > looks like it means we can enable this without worrying about compatibility > > with older QEMU binaries, am I right? > > I also thought this and it seemed to be working when I tested it, but > actually it isn't the case. Due to 'header_type >= MIG_RP_MSG_MAX' > condition after reading the header type, the return path on the source > side fails. So we need to change the default to 0, so it is disabled by > default. We figured this out offline.. :) We'll need a compat property if going with this approach. -- Peter Xu