Re: [RFC PATCH v2 3/8] migration/multifd: Terminate the TLS connection

2025-02-10 Thread Peter Xu
On Fri, Feb 07, 2025 at 03:15:48PM -0300, Fabiano Rosas wrote:
> >> +for (i = 0; i < migrate_multifd_channels(); i++) {
> >> +MultiFDSendParams *p = &multifd_send_state->params[i];
> >> +
> >> +/* thread_created implies the TLS handshake has succeeded */
> >> +if (p->tls_thread_created && p->thread_created) {
> >> +Error *local_err = NULL;
> >> +/*
> >> + * The destination expects the TLS session to always be
> >> + * properly terminated. This helps to detect a premature
> >> + * termination in the middle of the stream.  Note that
> >> + * older QEMUs always break the connection on the source
> >> + * and the destination always sees
> >> + * GNUTLS_E_PREMATURE_TERMINATION.
> >> + */
> >> +migration_tls_channel_end(p->c, &local_err);
> >> +
> >> +if (local_err) {
> >> +/*
> >> + * The above can fail with broken pipe due to a
> >> + * previous migration error, ignore the error.
> >> + */
> >> +assert(migration_has_failed(migrate_get_current()));
> >
> > Considering this is still src, do we want to be softer on this by
> > error_report?
> >
> > Logically !migration_has_failed() means it succeeded, so we can throw src
> > qemu way now, that shouldn't be a huge deal. More of thinking out loud kind
> > of comment..  Your call.
> >
> 
> Maybe even a warning? If at this point migration succeeded, it's probably
> best to let cleanup carry on.

Yep, warning sounds good too.

-- 
Peter Xu




Re: [RFC PATCH v2 3/8] migration/multifd: Terminate the TLS connection

2025-02-07 Thread Fabiano Rosas
Peter Xu  writes:

> On Fri, Feb 07, 2025 at 11:27:53AM -0300, Fabiano Rosas wrote:
>> The multifd recv side has been getting a TLS error of
>> GNUTLS_E_PREMATURE_TERMINATION at the end of migration when the send
>> side closes the sockets without ending the TLS session. This has been
>> masked by the code not checking the migration error after loadvm.
>> 
>> Start ending the TLS session at multifd_send_shutdown() so the recv
>> side always sees a clean termination (EOF) and we can start to
>> differentiate that from an actual premature termination that might
>> possibly happen in the middle of the migration.
>> 
>> There's nothing to be done if a previous migration error has already
>> broken the connection, so add a comment explaining it and ignore any
>> errors coming from gnutls_bye().
>> 
>> This doesn't break compat with older recv-side QEMUs because EOF has
>> always caused the recv thread to exit cleanly.
>> 
>> Signed-off-by: Fabiano Rosas 
>
> Reviewed-by: Peter Xu 
>
> One trivial comment..
>
>> ---
>>  migration/multifd.c | 34 +-
>>  migration/tls.c |  5 +
>>  migration/tls.h |  2 +-
>>  3 files changed, 39 insertions(+), 2 deletions(-)
>> 
>> diff --git a/migration/multifd.c b/migration/multifd.c
>> index ab73d6d984..b57cad3bb1 100644
>> --- a/migration/multifd.c
>> +++ b/migration/multifd.c
>> @@ -490,6 +490,32 @@ void multifd_send_shutdown(void)
>>  return;
>>  }
>>  
>> +for (i = 0; i < migrate_multifd_channels(); i++) {
>> +MultiFDSendParams *p = &multifd_send_state->params[i];
>> +
>> +/* thread_created implies the TLS handshake has succeeded */
>> +if (p->tls_thread_created && p->thread_created) {
>> +Error *local_err = NULL;
>> +/*
>> + * The destination expects the TLS session to always be
>> + * properly terminated. This helps to detect a premature
>> + * termination in the middle of the stream.  Note that
>> + * older QEMUs always break the connection on the source
>> + * and the destination always sees
>> + * GNUTLS_E_PREMATURE_TERMINATION.
>> + */
>> +migration_tls_channel_end(p->c, &local_err);
>> +
>> +if (local_err) {
>> +/*
>> + * The above can fail with broken pipe due to a
>> + * previous migration error, ignore the error.
>> + */
>> +assert(migration_has_failed(migrate_get_current()));
>
> Considering this is still src, do we want to be softer on this by
> error_report?
>
> Logically !migration_has_failed() means it succeeded, so we can throw src
> qemu way now, that shouldn't be a huge deal. More of thinking out loud kind
> of comment..  Your call.
>

Maybe even a warning? If at this point migration succeeded, it's probably
best to let cleanup carry on.

>> +}
>> +}
>> +}
>> +
>>  multifd_send_terminate_threads();
>>  
>>  for (i = 0; i < migrate_multifd_channels(); i++) {
>> @@ -1141,7 +1167,13 @@ static void *multifd_recv_thread(void *opaque)
>>  
>>  ret = qio_channel_read_all_eof(p->c, (void *)p->packet,
>> p->packet_len, &local_err);
>> -if (ret == 0 || ret == -1) {   /* 0: EOF  -1: Error */
>> +if (!ret) {
>> +/* EOF */
>> +assert(!local_err);
>> +break;
>> +}
>> +
>> +if (ret == -1) {
>>  break;
>>  }
>>  
>> diff --git a/migration/tls.c b/migration/tls.c
>> index fa03d9136c..5cbf952383 100644
>> --- a/migration/tls.c
>> +++ b/migration/tls.c
>> @@ -156,6 +156,11 @@ void migration_tls_channel_connect(MigrationState *s,
>>NULL);
>>  }
>>  
>> +void migration_tls_channel_end(QIOChannel *ioc, Error **errp)
>> +{
>> +qio_channel_tls_bye(QIO_CHANNEL_TLS(ioc), errp);
>> +}
>> +
>>  bool migrate_channel_requires_tls_upgrade(QIOChannel *ioc)
>>  {
>>  if (!migrate_tls()) {
>> diff --git a/migration/tls.h b/migration/tls.h
>> index 5797d153cb..58b25e1228 100644
>> --- a/migration/tls.h
>> +++ b/migration/tls.h
>> @@ -36,7 +36,7 @@ void migration_tls_channel_connect(MigrationState *s,
>> QIOChannel *ioc,
>> const char *hostname,
>> Error **errp);
>> -
>> +void migration_tls_channel_end(QIOChannel *ioc, Error **errp);
>>  /* Whether the QIO channel requires further TLS handshake? */
>>  bool migrate_channel_requires_tls_upgrade(QIOChannel *ioc);
>>  
>> -- 
>> 2.35.3
>> 



Re: [RFC PATCH v2 3/8] migration/multifd: Terminate the TLS connection

2025-02-07 Thread Peter Xu
On Fri, Feb 07, 2025 at 11:27:53AM -0300, Fabiano Rosas wrote:
> The multifd recv side has been getting a TLS error of
> GNUTLS_E_PREMATURE_TERMINATION at the end of migration when the send
> side closes the sockets without ending the TLS session. This has been
> masked by the code not checking the migration error after loadvm.
> 
> Start ending the TLS session at multifd_send_shutdown() so the recv
> side always sees a clean termination (EOF) and we can start to
> differentiate that from an actual premature termination that might
> possibly happen in the middle of the migration.
> 
> There's nothing to be done if a previous migration error has already
> broken the connection, so add a comment explaining it and ignore any
> errors coming from gnutls_bye().
> 
> This doesn't break compat with older recv-side QEMUs because EOF has
> always caused the recv thread to exit cleanly.
> 
> Signed-off-by: Fabiano Rosas 

Reviewed-by: Peter Xu 

One trivial comment..

> ---
>  migration/multifd.c | 34 +-
>  migration/tls.c |  5 +
>  migration/tls.h |  2 +-
>  3 files changed, 39 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/multifd.c b/migration/multifd.c
> index ab73d6d984..b57cad3bb1 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -490,6 +490,32 @@ void multifd_send_shutdown(void)
>  return;
>  }
>  
> +for (i = 0; i < migrate_multifd_channels(); i++) {
> +MultiFDSendParams *p = &multifd_send_state->params[i];
> +
> +/* thread_created implies the TLS handshake has succeeded */
> +if (p->tls_thread_created && p->thread_created) {
> +Error *local_err = NULL;
> +/*
> + * The destination expects the TLS session to always be
> + * properly terminated. This helps to detect a premature
> + * termination in the middle of the stream.  Note that
> + * older QEMUs always break the connection on the source
> + * and the destination always sees
> + * GNUTLS_E_PREMATURE_TERMINATION.
> + */
> +migration_tls_channel_end(p->c, &local_err);
> +
> +if (local_err) {
> +/*
> + * The above can fail with broken pipe due to a
> + * previous migration error, ignore the error.
> + */
> +assert(migration_has_failed(migrate_get_current()));

Considering this is still src, do we want to be softer on this by
error_report?

Logically !migration_has_failed() means it succeeded, so we can throw src
qemu way now, that shouldn't be a huge deal. More of thinking out loud kind
of comment..  Your call.

> +}
> +}
> +}
> +
>  multifd_send_terminate_threads();
>  
>  for (i = 0; i < migrate_multifd_channels(); i++) {
> @@ -1141,7 +1167,13 @@ static void *multifd_recv_thread(void *opaque)
>  
>  ret = qio_channel_read_all_eof(p->c, (void *)p->packet,
> p->packet_len, &local_err);
> -if (ret == 0 || ret == -1) {   /* 0: EOF  -1: Error */
> +if (!ret) {
> +/* EOF */
> +assert(!local_err);
> +break;
> +}
> +
> +if (ret == -1) {
>  break;
>  }
>  
> diff --git a/migration/tls.c b/migration/tls.c
> index fa03d9136c..5cbf952383 100644
> --- a/migration/tls.c
> +++ b/migration/tls.c
> @@ -156,6 +156,11 @@ void migration_tls_channel_connect(MigrationState *s,
>NULL);
>  }
>  
> +void migration_tls_channel_end(QIOChannel *ioc, Error **errp)
> +{
> +qio_channel_tls_bye(QIO_CHANNEL_TLS(ioc), errp);
> +}
> +
>  bool migrate_channel_requires_tls_upgrade(QIOChannel *ioc)
>  {
>  if (!migrate_tls()) {
> diff --git a/migration/tls.h b/migration/tls.h
> index 5797d153cb..58b25e1228 100644
> --- a/migration/tls.h
> +++ b/migration/tls.h
> @@ -36,7 +36,7 @@ void migration_tls_channel_connect(MigrationState *s,
> QIOChannel *ioc,
> const char *hostname,
> Error **errp);
> -
> +void migration_tls_channel_end(QIOChannel *ioc, Error **errp);
>  /* Whether the QIO channel requires further TLS handshake? */
>  bool migrate_channel_requires_tls_upgrade(QIOChannel *ioc);
>  
> -- 
> 2.35.3
> 

-- 
Peter Xu