Hi Daniel,

thank you for the feedback.

On 2025-09-09 16:09, Daniel P. Berrangé wrote:
> On Tue, Sep 09, 2025 at 05:01:24PM +0200, Juraj Marcin wrote:
> > From: Juraj Marcin <jmar...@redhat.com>
> > 
> > Usual system defaults for TCP keep-alive options are too long for
> > migration workload. On Linux, a TCP connection waits idle for 2 hours
> > before it starts checking if the connection is not broken.
> > 
> > Now when InetSocketAddress supports keep-alive options [1], this patch
> > applies migration specific defaults if they are not supplied by the user
> > or the management software. With these defaults, a migration TCP stream
> > waits idle for 1 minute and then sends 5 TCP keep-alive packets in 30
> > second interval before considering the connection as broken.
> > 
> > System defaults can be still used by explicitly setting these parameters
> > to 0.
> 
> IMHO this is not a good idea. This is a very short default, which
> may be fine for the scenario where your network conn is permanently
> dead, but it is going to cause undesirable failures when the network
> conn is only temporarily dead.
> 
> Optimizing defaults for temporary outages is much more preferrable
> as that maximises reliability of migration. In the case of permanent
> outages, it is already possible to tear down the connection without
> waiting for a keep-alive timeout, and liveliness checks can also be
> perform by the mgmt app at a higher level too. The TCP keepalives
> are just an eventual failsafe, and having those work on a long
> timeframe is OK.

So, would extending the keep-alive parameter to, for example 30 minutes
be an acceptable solution? Or the system default is preferred and the
patch should just enable keep-alive without setting other parameters?

Thank you!

Best regards,

Juraj Marcin

> 
> > 
> > [1]: 1bd4237cb1 "util/qemu-sockets: Introduce inet socket options
> >      controlling TCP keep-alive"
> > 
> > Signed-off-by: Juraj Marcin <jmar...@redhat.com>
> > ---
> >  migration/migration.c | 39 +++++++++++++++++++++++++++++++++++++++
> >  qapi/migration.json   |  6 ++++++
> >  2 files changed, 45 insertions(+)
> > 
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 10c216d25d..a1f1223946 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -74,6 +74,11 @@
> >  
> >  #define INMIGRATE_DEFAULT_EXIT_ON_ERROR true
> >  
> > +#define INET_SOCKET_DEFAULT_KEEP_ALIVE true
> > +#define INET_SOCKET_DEFAULT_KEEP_ALIVE_COUNT 5
> > +#define INET_SOCKET_DEFAULT_KEEP_ALIVE_IDLE 60
> > +#define INET_SOCKET_DEFAULT_KEEP_ALIVE_INTERVAL 30
> > +
> >  static NotifierWithReturnList migration_state_notifiers[] = {
> >      NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_NORMAL),
> >      NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_CPR_REBOOT),
> > @@ -718,6 +723,36 @@ bool migrate_uri_parse(const char *uri, 
> > MigrationChannel **channel,
> >      return true;
> >  }
> >  
> > +static void migration_address_apply_defaults(MigrationAddress *addr)
> > +{
> > +    if (addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET &&
> > +        addr->u.socket.type == SOCKET_ADDRESS_TYPE_INET) {
> > +        InetSocketAddress *inet = &addr->u.socket.u.inet;
> > +        if (!inet->has_keep_alive) {
> > +            inet->has_keep_alive = true;
> > +            inet->keep_alive = INET_SOCKET_DEFAULT_KEEP_ALIVE;
> > +        }
> > +#ifdef HAVE_TCP_KEEPCNT
> > +        if (!inet->has_keep_alive_count) {
> > +            inet->has_keep_alive_count = true;
> > +            inet->keep_alive_count = INET_SOCKET_DEFAULT_KEEP_ALIVE_COUNT;
> > +        }
> > +#endif
> > +#ifdef HAVE_TCP_KEEPIDLE
> > +        if (!inet->has_keep_alive_idle) {
> > +            inet->has_keep_alive_idle = true;
> > +            inet->keep_alive_idle = INET_SOCKET_DEFAULT_KEEP_ALIVE_IDLE;
> > +        }
> > +#endif
> > +#ifdef HAVE_TCP_KEEPINTVL
> > +        if (!inet->has_keep_alive_interval) {
> > +            inet->has_keep_alive_interval = true;
> > +            inet->keep_alive_interval = 
> > INET_SOCKET_DEFAULT_KEEP_ALIVE_INTERVAL;
> > +        }
> > +#endif
> > +    }
> > +}
> > +
> >  static bool
> >  migration_incoming_state_setup(MigrationIncomingState *mis, Error **errp)
> >  {
> > @@ -775,6 +810,8 @@ static void qemu_start_incoming_migration(const char 
> > *uri, bool has_channels,
> >          addr = channel->addr;
> >      }
> >  
> > +    migration_address_apply_defaults(addr);
> > +
> >      /* transport mechanism not suitable for migration? */
> >      if (!migration_transport_compatible(addr, errp)) {
> >          return;
> > @@ -2232,6 +2269,8 @@ void qmp_migrate(const char *uri, bool has_channels,
> >          addr = channel->addr;
> >      }
> >  
> > +    migration_address_apply_defaults(addr);
> > +
> >      /* transport mechanism not suitable for migration? */
> >      if (!migration_transport_compatible(addr, errp)) {
> >          return;
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index 2387c21e9c..68d4acb5db 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -1639,6 +1639,12 @@
> >  #
> >  # Migration endpoint configuration.
> >  #
> > +# If transport is socket of type inet, it has the following defaults:
> > +# keep-alive: true, keep-alive-count: 5, keep-alive-idle: 60 seconds,
> > +# keep-alive-interval: 30 seconds.  System defaults can be used by
> > +# explicitly setting these parameters to 0.  See `InetSocketAddress` for
> > +# more details.
> > +#
> >  # @transport: The migration stream transport mechanism
> >  #
> >  # Since: 8.2
> > -- 
> > 2.51.0
> > 
> 
> With regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
> 


Reply via email to