On Fri, Mar 27, 2020 at 12:42:21PM -0500, Eric Blake wrote: > On 3/27/20 11:35 AM, Daniel P. Berrangé wrote: > > On Fri, Mar 27, 2020 at 11:19:36AM -0500, Eric Blake wrote: > > > Although the remote end should always be tolerant of a socket being > > > arbitrarily closed, there are situations where it is a lot easier if > > > the remote end can be guaranteed to read EOF even before the socket > > > has closed. In particular, when using gnutls, if we fail to inform > > > the remote end about an impending teardown, the remote end cannot > > > distinguish between our closing the socket as intended vs. a malicious > > > intermediary interrupting things, and may result in spurious error > > > messages. > > > > Does this actually matter in the NBD case ? > > > > It has an explicit NBD command for requesting shutdown, and once > > that's processed, it is fine to just close the socket abruptly - I > > don't see a benefit to a TLS shutdown sequence on top. > > You're right that the NBD protocol has ways for the client to advertise it > will be shutting down, AND documents that the server must be robust to > clients that just abruptly disconnect after that point. But we don't have > control over all such servers, and there may very well be a server that logs > an error on abrupt closure, where it would be silent if we did a proper > gnutls_bye. Which is more important: maximum speed in disconnecting after > we expressed intent, or maximum attempt at catering to all sorts of remote > implementations that might not be as tolerant as qemu is of an abrupt > termination?
It is the cost / benefit tradeoff here that matters. Correctly using gnutls_bye(), in contexts which aren't expected to block is non-trivial bringing notable extra code complexity. It isn't an obvious win to me for something that just changes an error message for a scenario that can already be cleanly handled at the application protocol level. > > > AFAIK, the TLS level clean shutdown is only required if the > > application protocol does not have any way to determine an > > unexpected shutdown itself. > > 'man gnutls_bye' states: > > Note that not all implementations will properly terminate a TLS > connec‐ > tion. Some of them, usually for performance reasons, will > terminate > only the underlying transport layer, and thus not > distinguishing > between a malicious party prematurely terminating the connection > and > normal termination. > > You're right that because the protocol has an explicit message, we can > reliably distinguish any early termination prior to > NBD_OPT_ABORT/NBD_CMD_DISC as being malicious, so the only case where it > matters is if we have a premature termination after we asked for clean > shutdown, at which point a malicious termination didn't lose any data. So on > that front, I guess you are right that not using gnutls_bye isn't going to > have much impact. > > > > > This is relevant for HTTP where the connection data stream may not > > have a well defined end condition. > > > > In the NBD case though, we have an explicit NBD_CMD_DISC to trigger > > the disconnect. After processing that message, an EOF is acceptable > > regardless of whether , > > before processing that message, any EOF is a unexpected. > > > > > Or, we can end up with a deadlock where both ends are stuck > > > on a read() from the other end but neither gets an EOF. > > > > If the socket has been closed abruptly why would it get stuck in > > read() - it should see EOF surely ? > > That's what I'm trying to figure out: the nbdkit testsuite definitely hung > even though 'qemu-nbd --list' exited, but I haven't yet figured out whether > the bug lies in nbdkit proper or in libnbd, nor whether a cleaner tls > shutdown would have prevented the hang in a more reliable manner. > https://www.redhat.com/archives/libguestfs/2020-March/msg00191.html Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|