On 2025-01-31 10:03, Peter Xu wrote: > On Fri, Jan 31, 2025 at 02:42:41PM +0100, Juraj Marcin wrote: > > On 2025-01-30 16:04, Peter Xu wrote: > > > On Thu, Jan 30, 2025 at 05:11:36PM +0100, Juraj Marcin wrote: > > > > When there are no page requests from the destination side and the > > > > connection breaks, the destination might not be aware of it. This is > > > > the case for example with a TCP connection, which by default remains > > > > open unless it is explicitly closed or the TCP stack fails to > > > > successfully send a packet. > > > > > > > > Such situation without any traffic from the destination to the source > > > > side can happen for multiple reasons. For example, if all pages accessed > > > > at idle are already migrated, but there are still non-migrated pages > > > > that are accessed very infrequently, or if the destination QEMU is > > > > started with '-S' option and the management application does not run > > > > the 'cont' QMP command right away. > > > > > > > > Due to this, the destination side might not know about the network > > > > failure and stays in the postcopy-active state instead of switching > > > > to the postcopy-paused state, informing the destination management > > > > application about the issue. > > > > > > > > This patch resolves this by sending a keepalive message on the return > > > > path to the source if there is no page fault (thus page request message) > > > > in a certain amount of time. This wait time can be configured with new > > > > 'postcopy-rp-keepalive-period' migration parameter, by default it is > > > > 60000 ms. By setting this parameter to 0, sending keepalive messages > > > > can be disabled completely. > > > > > > Hmm, I see.. Does it mean it would also work if we leverage tcp keepalive > > > events? I just checked on my system, indeed it can at least be too large > > > a > > > default value: > > > > > > $ sysctl net.ipv4.tcp_keepalive_time > > > net.ipv4.tcp_keepalive_time = 7200 > > > > > > I wonder if it makes more sense to set it to a smaller value. What you > > > proposed here (60s) as default sounds more reasonable indeed at least to > > > me. > > > > > > Is that a concern (e.g. tcp keepalive not enabled, or timeout too large) > > > so > > > that maybe it's better we have our own way to timeout the session? If so, > > > maybe worthwhile to add some comment on that. > > > > Accepting or relaying TCP keepalive packets is not required by the RFC > > and is not enabled on sockets used (at least on the destination side, > > which I cheked). Therefore, I opted to implement it at higher layer, > > migration protocol itself. I can mention it the commit message. > > > > However, even with 60s keepalive period, the time it takes for the TCP > > stack to give up trying to send a packet can be around 15-20 minutes. > > This could be possibly configured at socket level with > > 'TCP_USER_TIMEOUT' socket option. > > I see it's defined as: > > #define TCP_USER_TIMEOUT 18 /* How long for loss retry before > timeout */ > > It doesn't look like to be the knob for the keep alive? > > Since at it, I did check the relevant knobs for keep alive, and looks like > they do exist, and they can also be configured per-socket (and IIUC it > shouldn't be affected by the global configuration). Then.. instead of this > new migration packet, how about we configure tcp keep alive on migration > sockets? We may also not need a compat property entry. > > IIUC, we need to tune below knobs: > > #define TCP_KEEPIDLE 4 /* Start keeplives after this period */ > #define TCP_KEEPINTVL 5 /* Interval between keepalives */ > #define TCP_KEEPCNT 6 /* Number of keepalives before death */ > > So maybe we can setup SO_KEEPALIVE to 1 to enable keepalive, then adjust > above three knobs to make sure it HUPs properly when detecting network > down?
Given that the change does not cover communication with older QEMU, trying to use the TCP keepalive might be a better option in the end. That should be transparent to the receiving process if I understand it correctly, so it would also work with older source side QEMU. > > [...] > > > > > @@ -2399,6 +2407,10 @@ static void *source_return_path_thread(void > > > > *opaque) > > > > trace_source_return_path_thread_switchover_acked(); > > > > break; > > > > > > > > + case MIG_RP_MSG_KEEPALIVE: > > > > + trace_source_return_path_thread_received_keepalive(); > > > > + break; > > > > + > > > > default: > > > > break; > > > > > > So we ignore unknown messages on return path. I didn't expect it, but > > > looks like it means we can enable this without worrying about > > > compatibility > > > with older QEMU binaries, am I right? > > > > I also thought this and it seemed to be working when I tested it, but > > actually it isn't the case. Due to 'header_type >= MIG_RP_MSG_MAX' > > condition after reading the header type, the return path on the source > > side fails. So we need to change the default to 0, so it is disabled by > > default. > > We figured this out offline.. :) We'll need a compat property if going > with this approach. > > -- > Peter Xu > -- Juraj Marcin