On 2025-01-31 10:03, Peter Xu wrote:
> On Fri, Jan 31, 2025 at 02:42:41PM +0100, Juraj Marcin wrote:
> > On 2025-01-30 16:04, Peter Xu wrote:
> > > On Thu, Jan 30, 2025 at 05:11:36PM +0100, Juraj Marcin wrote:
> > > > When there are no page requests from the destination side and the
> > > > connection breaks, the destination might not be aware of it. This is
> > > > the case for example with a TCP connection, which by default remains
> > > > open unless it is explicitly closed or the TCP stack fails to
> > > > successfully send a packet.
> > > > 
> > > > Such situation without any traffic from the destination to the source
> > > > side can happen for multiple reasons. For example, if all pages accessed
> > > > at idle are already migrated, but there are still non-migrated pages
> > > > that are accessed very infrequently, or if the destination QEMU is
> > > > started with '-S' option and the management application does not run
> > > > the 'cont' QMP command right away.
> > > > 
> > > > Due to this, the destination side might not know about the network
> > > > failure and stays in the postcopy-active state instead of switching
> > > > to the postcopy-paused state, informing the destination management
> > > > application about the issue.
> > > > 
> > > > This patch resolves this by sending a keepalive message on the return
> > > > path to the source if there is no page fault (thus page request message)
> > > > in a certain amount of time. This wait time can be configured with new
> > > > 'postcopy-rp-keepalive-period' migration parameter, by default it is
> > > > 60000 ms. By setting this parameter to 0, sending keepalive messages
> > > > can be disabled completely.
> > > 
> > > Hmm, I see.. Does it mean it would also work if we leverage tcp keepalive
> > > events?  I just checked on my system, indeed it can at least be too large 
> > > a
> > > default value:
> > > 
> > > $ sysctl net.ipv4.tcp_keepalive_time
> > > net.ipv4.tcp_keepalive_time = 7200
> > > 
> > > I wonder if it makes more sense to set it to a smaller value.  What you
> > > proposed here (60s) as default sounds more reasonable indeed at least to
> > > me.
> > > 
> > > Is that a concern (e.g. tcp keepalive not enabled, or timeout too large) 
> > > so
> > > that maybe it's better we have our own way to timeout the session?  If so,
> > > maybe worthwhile to add some comment on that.
> > 
> > Accepting or relaying TCP keepalive packets is not required by the RFC
> > and is not enabled on sockets used (at least on the destination side,
> > which I cheked). Therefore, I opted to implement it at higher layer,
> > migration protocol itself. I can mention it the commit message.
> > 
> > However, even with 60s keepalive period, the time it takes for the TCP
> > stack to give up trying to send a packet can be around 15-20 minutes.
> > This could be possibly configured at socket level with
> > 'TCP_USER_TIMEOUT' socket option.
> 
> I see it's defined as:
> 
> #define TCP_USER_TIMEOUT      18      /* How long for loss retry before 
> timeout */
> 
> It doesn't look like to be the knob for the keep alive?
> 
> Since at it, I did check the relevant knobs for keep alive, and looks like
> they do exist, and they can also be configured per-socket (and IIUC it
> shouldn't be affected by the global configuration).  Then.. instead of this
> new migration packet, how about we configure tcp keep alive on migration
> sockets?  We may also not need a compat property entry.
> 
> IIUC, we need to tune below knobs:
> 
> #define TCP_KEEPIDLE          4       /* Start keeplives after this period */
> #define TCP_KEEPINTVL         5       /* Interval between keepalives */
> #define TCP_KEEPCNT           6       /* Number of keepalives before death */
> 
> So maybe we can setup SO_KEEPALIVE to 1 to enable keepalive, then adjust
> above three knobs to make sure it HUPs properly when detecting network
> down?

Given that the change does not cover communication with older QEMU,
trying to use the TCP keepalive might be a better option in the end.
That should be transparent to the receiving process if I understand it
correctly, so it would also work with older source side QEMU.

> 
> [...]
> 
> > > > @@ -2399,6 +2407,10 @@ static void *source_return_path_thread(void 
> > > > *opaque)
> > > >              trace_source_return_path_thread_switchover_acked();
> > > >              break;
> > > >  
> > > > +        case MIG_RP_MSG_KEEPALIVE:
> > > > +            trace_source_return_path_thread_received_keepalive();
> > > > +            break;
> > > > +
> > > >          default:
> > > >              break;
> > > 
> > > So we ignore unknown messages on return path.  I didn't expect it, but
> > > looks like it means we can enable this without worrying about 
> > > compatibility
> > > with older QEMU binaries, am I right?
> > 
> > I also thought this and it seemed to be working when I tested it, but
> > actually it isn't the case. Due to 'header_type >= MIG_RP_MSG_MAX'
> > condition after reading the header type, the return path on the source
> > side fails. So we need to change the default to 0, so it is disabled by
> > default.
> 
> We figured this out offline.. :)  We'll need a compat property if going
> with this approach.
> 
> -- 
> Peter Xu
> 

-- 
Juraj Marcin


Reply via email to