Hi Daniel, thank you for the feedback.
On 2025-09-09 16:09, Daniel P. Berrangé wrote: > On Tue, Sep 09, 2025 at 05:01:24PM +0200, Juraj Marcin wrote: > > From: Juraj Marcin <jmar...@redhat.com> > > > > Usual system defaults for TCP keep-alive options are too long for > > migration workload. On Linux, a TCP connection waits idle for 2 hours > > before it starts checking if the connection is not broken. > > > > Now when InetSocketAddress supports keep-alive options [1], this patch > > applies migration specific defaults if they are not supplied by the user > > or the management software. With these defaults, a migration TCP stream > > waits idle for 1 minute and then sends 5 TCP keep-alive packets in 30 > > second interval before considering the connection as broken. > > > > System defaults can be still used by explicitly setting these parameters > > to 0. > > IMHO this is not a good idea. This is a very short default, which > may be fine for the scenario where your network conn is permanently > dead, but it is going to cause undesirable failures when the network > conn is only temporarily dead. > > Optimizing defaults for temporary outages is much more preferrable > as that maximises reliability of migration. In the case of permanent > outages, it is already possible to tear down the connection without > waiting for a keep-alive timeout, and liveliness checks can also be > perform by the mgmt app at a higher level too. The TCP keepalives > are just an eventual failsafe, and having those work on a long > timeframe is OK. So, would extending the keep-alive parameter to, for example 30 minutes be an acceptable solution? Or the system default is preferred and the patch should just enable keep-alive without setting other parameters? Thank you! Best regards, Juraj Marcin > > > > > [1]: 1bd4237cb1 "util/qemu-sockets: Introduce inet socket options > > controlling TCP keep-alive" > > > > Signed-off-by: Juraj Marcin <jmar...@redhat.com> > > --- > > migration/migration.c | 39 +++++++++++++++++++++++++++++++++++++++ > > qapi/migration.json | 6 ++++++ > > 2 files changed, 45 insertions(+) > > > > diff --git a/migration/migration.c b/migration/migration.c > > index 10c216d25d..a1f1223946 100644 > > --- a/migration/migration.c > > +++ b/migration/migration.c > > @@ -74,6 +74,11 @@ > > > > #define INMIGRATE_DEFAULT_EXIT_ON_ERROR true > > > > +#define INET_SOCKET_DEFAULT_KEEP_ALIVE true > > +#define INET_SOCKET_DEFAULT_KEEP_ALIVE_COUNT 5 > > +#define INET_SOCKET_DEFAULT_KEEP_ALIVE_IDLE 60 > > +#define INET_SOCKET_DEFAULT_KEEP_ALIVE_INTERVAL 30 > > + > > static NotifierWithReturnList migration_state_notifiers[] = { > > NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_NORMAL), > > NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_CPR_REBOOT), > > @@ -718,6 +723,36 @@ bool migrate_uri_parse(const char *uri, > > MigrationChannel **channel, > > return true; > > } > > > > +static void migration_address_apply_defaults(MigrationAddress *addr) > > +{ > > + if (addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET && > > + addr->u.socket.type == SOCKET_ADDRESS_TYPE_INET) { > > + InetSocketAddress *inet = &addr->u.socket.u.inet; > > + if (!inet->has_keep_alive) { > > + inet->has_keep_alive = true; > > + inet->keep_alive = INET_SOCKET_DEFAULT_KEEP_ALIVE; > > + } > > +#ifdef HAVE_TCP_KEEPCNT > > + if (!inet->has_keep_alive_count) { > > + inet->has_keep_alive_count = true; > > + inet->keep_alive_count = INET_SOCKET_DEFAULT_KEEP_ALIVE_COUNT; > > + } > > +#endif > > +#ifdef HAVE_TCP_KEEPIDLE > > + if (!inet->has_keep_alive_idle) { > > + inet->has_keep_alive_idle = true; > > + inet->keep_alive_idle = INET_SOCKET_DEFAULT_KEEP_ALIVE_IDLE; > > + } > > +#endif > > +#ifdef HAVE_TCP_KEEPINTVL > > + if (!inet->has_keep_alive_interval) { > > + inet->has_keep_alive_interval = true; > > + inet->keep_alive_interval = > > INET_SOCKET_DEFAULT_KEEP_ALIVE_INTERVAL; > > + } > > +#endif > > + } > > +} > > + > > static bool > > migration_incoming_state_setup(MigrationIncomingState *mis, Error **errp) > > { > > @@ -775,6 +810,8 @@ static void qemu_start_incoming_migration(const char > > *uri, bool has_channels, > > addr = channel->addr; > > } > > > > + migration_address_apply_defaults(addr); > > + > > /* transport mechanism not suitable for migration? */ > > if (!migration_transport_compatible(addr, errp)) { > > return; > > @@ -2232,6 +2269,8 @@ void qmp_migrate(const char *uri, bool has_channels, > > addr = channel->addr; > > } > > > > + migration_address_apply_defaults(addr); > > + > > /* transport mechanism not suitable for migration? */ > > if (!migration_transport_compatible(addr, errp)) { > > return; > > diff --git a/qapi/migration.json b/qapi/migration.json > > index 2387c21e9c..68d4acb5db 100644 > > --- a/qapi/migration.json > > +++ b/qapi/migration.json > > @@ -1639,6 +1639,12 @@ > > # > > # Migration endpoint configuration. > > # > > +# If transport is socket of type inet, it has the following defaults: > > +# keep-alive: true, keep-alive-count: 5, keep-alive-idle: 60 seconds, > > +# keep-alive-interval: 30 seconds. System defaults can be used by > > +# explicitly setting these parameters to 0. See `InetSocketAddress` for > > +# more details. > > +# > > # @transport: The migration stream transport mechanism > > # > > # Since: 8.2 > > -- > > 2.51.0 > > > > With regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| >