* Daniel P. Berrangé (berra...@redhat.com) wrote: > On Thu, May 27, 2021 at 09:09:09AM -0400, Peter Xu wrote: > > On Thu, May 27, 2021 at 01:37:42PM +0100, Daniel P. Berrangé wrote: > > > On Thu, May 27, 2021 at 08:23:52AM -0400, Peter Xu wrote: > > > > On Thu, May 27, 2021 at 09:46:54AM +0100, Daniel P. Berrangé wrote: > > > > > On Wed, May 26, 2021 at 05:58:58PM -0400, Peter Xu wrote: > > > > > > On Wed, May 26, 2021 at 11:21:03PM +0200, Lukas Straub wrote: > > > > > > > On Wed, 26 May 2021 16:40:35 -0400 > > > > > > > Peter Xu <pet...@redhat.com> wrote: > > > > > > > > > > > > > > > On Wed, May 26, 2021 at 05:05:40PM -0300, Leonardo Bras wrote: > > > > > > > > > After yank feature was introduced, whenever migration is > > > > > > > > > started using TLS, > > > > > > > > > the following error happens in both source and destination > > > > > > > > > hosts: > > > > > > > > > > > > > > > > > > (qemu) qemu-kvm: ../util/yank.c:107: yank_unregister_instance: > > > > > > > > > Assertion `QLIST_EMPTY(&entry->yankfns)' failed. > > > > > > > > > > > > > > > > > > This happens because of a missing yank_unregister_function() > > > > > > > > > when using > > > > > > > > > qio-channel-tls. > > > > > > > > > > > > > > > > > > Fix this by also allowing TYPE_QIO_CHANNEL_TLS object type to > > > > > > > > > perform > > > > > > > > > yank_unregister_function() in channel_close() and > > > > > > > > > multifd_load_cleanup(). > > > > > > > > > > > > > > > > > > Fixes: 50186051f ("Introduce yank feature") > > > > > > > > > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1964326 > > > > > > > > > Signed-off-by: Leonardo Bras <leobra...@gmail.com> > > > > > > > > > > > > > > > > Leo, > > > > > > > > > > > > > > > > Thanks for looking into it! > > > > > > > > > > > > > > > > So before looking int the fix... I do have a doubt on why we > > > > > > > > only enable yank > > > > > > > > on socket typed, as I think tls should also work with > > > > > > > > qio_channel_shutdown(). > > > > > > > > > > > > > > > > IIUC the confused thing here is we register only for > > > > > > > > qio-socket, however tls > > > > > > > > will actually call migration_channel_connect() twice, first > > > > > > > > with a qio-socket, > > > > > > > > then with the real tls-socket. For tls I feel like we have > > > > > > > > registered with the > > > > > > > > wrong channel - instead of the wrapper socket ioc, we should > > > > > > > > register to the > > > > > > > > final tls ioc? > > > > > > > > > > > > > > > > Lukas, is there a reason? > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > There is no specific reason. Both ways work equally well in > > > > > > > preventing > > > > > > > qemu from hanging. shutdown() for tls-channel just makes it abort > > > > > > > a > > > > > > > little sooner (by not attempting to encrypt and send data > > > > > > > anymore). > > > > > > > > > > > > > > I don't lean either way. I guess registering it on the tls-channel > > > > > > > makes is a bit more explicit and clearer. > > > > > > > > > > > > Agreed, because IMHO logically the migration code should not be > > > > > > aware of > > > > > > internals of IOChannels, e.g., we shouldn't need to know > > > > > > ioc->master is the > > > > > > socket ioc of tls ioc to unregister. > > > > > > > > > > I think it is atually better to ignore the TLS channel and *always* > > > > > yank > > > > > on the undering socket IO channel. The yank functionality is intended > > > > > to > > > > > be used in a scenario where we know the channels are broken. If yank > > > > > calls the high level IO channel it is potentially going to try to do a > > > > > cleanup shutdown that we know will fail because of the broken network. > > > > > > > > Could you elaborate what's the "cleanup shutdown"? > > > > > > > > The yank calls migration_yank_iochannel: > > > > > > > > void migration_yank_iochannel(void *opaque) > > > > { > > > > QIOChannel *ioc = QIO_CHANNEL(opaque); > > > > > > > > qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL); > > > > } > > > > > > > > Where qio_channel_shutdown for tls is nothing but delivers that to the > > > > master > > > > channel: > > > > > > > > static int qio_channel_tls_shutdown(QIOChannel *ioc, > > > > QIOChannelShutdown how, > > > > Error **errp) > > > > { > > > > QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc); > > > > > > > > qatomic_or(&tioc->shutdown, how); > > > > > > > > return qio_channel_shutdown(tioc->master, how, errp); > > > > } > > > > > > > > So I thought it was a nice wrapper just for things like this, and I > > > > didn't see > > > > anything it does more than the io_shutdown for the socket channel. Did > > > > I miss > > > > something? > > > > > > Today thats the case, but don't assume it will be the case forever. > > > There is a mechanism in TLS for doing clean shutdown which we've > > > debated including. > > > > > > In general apps *can* just call the shutdown method on the QIOChannelTLS > > > object no matter what. Yank is just a little bit special because of its > > > need to be guaranteed to work even when the network is dead. So yank > > > should always directly call the low level QIOChannelSocket, so thre is > > > a strong guarantee it can't block on something. > > > > Hmm, I am still not fully convinced that that's a valid reason the migration > > code should be aware of how the socket is managed in tls channels... > > > > Does that sound more like a good reason to introduce QIOChannelShutdown with > > QIO_CHANNEL_SHUTDOWN_FORCE so it'll always not block if FORCE set? Then we > > can > > switch the yank function to use that. > > > > What do you think? > > I think that's unneccessary - the migration code already does similar > things elsewhere when it wants to distinguish TLS usage, so this is not > anything new conceptually.
I'd probably agree with Peter here that it would be preferential for the migration code not to know; what we want to do here is give the channel the opportunity to do a hard shutdown, and let the channel worry about it whether it can do that. If you look at the nbd code, it's yank registration calls qio_channel_shutdown on whatever channel it happens to have; so that's probably wrong since I think you're saying that could block. Dave > Regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK