On Wed, Jul 27, 2016 at 01:14:54AM +0400, marcandre.lur...@redhat.com wrote: > From: Marc-André Lureau <marcandre.lur...@redhat.com> > > Hi, > > Since 'vhost-user: simple reconnection support' has been merged, it is > possible to disconnect and reconnect a vhost-user backend. However, > many code paths in qemu may trigger assert() when the backend is > disconnected. > > There are also code paths that are wrong, see "don't assume opaque is > a fd" patch for an example. Once those patches are reviewed & merged, > they are good candidates for stable too.
Thanks, I merged most of these except the new tests. > Some assert() could simply be replaced by error_report() or silently > fail since they are recoverable cases. Some missing error checks can > also help prevent later issues. The errors are reported up to vhost.c, > as the vhost-user backend alone doesn't handle disconnected state > transparently so far. There are still problematic code paths left > after this series, for example, starting a migration with a > disconnected backend will abort(). It is likely that other problematic > code path exists (vhost_scsi_start failure is fatal, but there are no > vhost-user backend that I know yet). > > In many cases, the code assumes get_vhost_net() will be non-NULL after > a succesful connection, so I changed it to stay after a disconnect > until the new connection comes (as suggested by Michael). > > Since there is feature checks on reconnection, qemu should wait for > the initial connection feature negotiation to complete. The test added > demonstrates this. Additionally, a regression was found during v2, > which could have been spotted with a multiqueue test, so add a basic > one that would have exhibited the issue. > > For convenience, the series is also available on: > https://github.com/elmarco/qemu, branch vhost-user-reconnect > > v6: > - fixes after Ilya Maximets review > - add "disconnect on HUP" for cases where peer disconnect doesn't > trigger qemu disconnect event (reconnect won't work) > - add a flags mismatch on reconnect test > > v5: > - rebased > - use a VHOST_OPS_DEBUG macro to print vhost_ops errors > - replace assert(foo != NULL) with assert(foo) > - add "RFC: vhost: do not update last avail idx" > > v4: > - change notify_migration_done() patch to be VHOST_BACKEND_TYPE_USER > specific, to avoid having to handle the case where the backend > doesn't implement the callback > - change vhost_dev_cleanup() to assert on empty log, instead of > adding a call to vhost_log_put() > - made "keep vhost_net after a disconnection" more robust, got rid of > the acked_features field > - improve commit log, and some patch reorganization for clarity > > v3: > - add vhost-user multiqueue test, which would have helped to find the > following fix > - fix waiting on vhost-user connection with multiqueue (found by > Yuanhan Liu) > - merge vhost_user_{read,write}() error checking patches > - add error reporting to vhost_user_read() (similar to > vhost_user_write()) > - add a vhost_net_set_backend() wrapper to help with potential crash > - some leak fixes > > v2: > - some patch ordering: minor fix, close(fd) fix, > assert/fprintf->error_report, check and return error, > vhost_dev_cleanup() fixes, keep vhost_net after a disconnect, wait > until connection is ready > - merge read/write error checks > - do not rely on link state to check vhost-user init completed > > Marc-André Lureau (33): > misc: indentation > vhost-user: minor simplification > vhost-user: disconnect on HUP > vhost: don't assume opaque is a fd, use backend cleanup > vhost: make vhost_log_put() idempotent > vhost: assert the log was cleaned up > vhost: fix cleanup on not fully initialized device > vhost: make vhost_dev_cleanup() idempotent > vhost-net: always call vhost_dev_cleanup() on failure > vhost: fix calling vhost_dev_cleanup() after vhost_dev_init() > vhost: do not assert() on vhost_ops failure > vhost: add missing VHOST_OPS_DEBUG > vhost: use error_report() instead of fprintf(stderr,...) > qemu-char: fix qemu_chr_fe_set_msgfds() crash when disconnected > vhost-user: call set_msgfds unconditionally > vhost-user: check qemu_chr_fe_set_msgfds() return value > vhost-user: check vhost_user_{read,write}() return value > vhost-user: keep vhost_net after a disconnection > vhost-user: add get_vhost_net() assertions > Revert "vhost-net: do not crash if backend is not present" > vhost-net: vhost_migration_done is vhost-user specific > vhost: add assert() to check runtime behaviour > char: add chr_wait_connected callback > char: add and use tcp_chr_wait_connected > vhost-user: wait until backend init is completed > tests: plug some leaks in virtio-net-test > tests: fix vhost-user-test leak > tests: add /vhost-user/connect-fail test > tests: add a simple /vhost-user/multiqueue test > vhost-user: add error report in vhost_user_write() > vhost: add vhost_net_set_backend() > vhost-user-test: add flags mismatch test > RFC: vhost: do not update last avail idx on get_vring_base() failure > > hw/net/vhost_net.c | 34 +++----- > hw/virtio/vhost-user.c | 67 ++++++++++----- > hw/virtio/vhost.c | 162 +++++++++++++++++++++++------------- > include/hw/virtio/vhost.h | 4 + > include/sysemu/char.h | 8 ++ > net/tap.c | 1 + > net/vhost-user.c | 65 +++++++++------ > qemu-char.c | 82 ++++++++++++------ > tests/Makefile.include | 2 +- > tests/vhost-user-test.c | 206 > +++++++++++++++++++++++++++++++++++++++++++++- > tests/virtio-net-test.c | 12 ++- > 11 files changed, 486 insertions(+), 157 deletions(-) > > -- > 2.9.0