Re: [Qemu-block] [PATCH 2/6 for-3.1] nbd: stop waiting for a NBD response with NBD_CMD_DISC
On 11/19/18 4:23 AM, Daniel P. Berrangé wrote: Simply setting the 'quit' flag after sending NBD_CMD_DISC is enough to get the coroutine to stop waiting for a reply and thus supress the error message. Actually, it's not quite enough - once you actually start performing I/O, enough coroutines are kicked off that the error still happens: Hmm, does the client not send requests synchronously ? I expected that any other I/O operations would have already received their reply by the time we sent the DISC command. During handshake phase (NBD_OPT_* messages), client sends requests synchronously (at most one pending read, in response to the most recent write). During transmission phase (NBD_CMD_* messages), the client is allowed to send messages asynchronously. However, the NBD protocol recommends (but not requires) that the client never send NBD_CMD_DISC to disconnect until AFTER the client has collected all responses to any earlier in-flight messages. And if I'm reading the block/nbd-client.c code correctly, in general we don't try to shutdown a client while there are any pending commands with unread replies, except when we are shutting down due to detecting a protocol error (at which point we are already out of sync with the server and waiting for replies is not going to be helpful). But I want to do more testing to make sure I'm not missing out on reporting an actual error if I add that. Yes, I'd be a little concerned about missing reporting of an error from a command other than NBD_CMD_DISC if we did this. But it looks like your approach of making the qio code behave more nicely is a smarter cleanup than my idea, so I'm willing to ditch my code in favor of yours. Testing your approach now :) -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-block] [PATCH 2/6 for-3.1] nbd: stop waiting for a NBD response with NBD_CMD_DISC
On Sat, Nov 17, 2018 at 08:19:10PM -0600, Eric Blake wrote: > On 11/16/18 9:53 AM, Daniel P. Berrangé wrote: > > When sending a NBD_CMD_DISC message there is no reply expected, > > however, the nbd_read_eof() coroutine is still waiting for a reply. > > In a plain NBD connection this doesn't matter as it will just get an > > EOF, however, on a TLS connection it will get an interrupted TLS data > > packet. The nbd_read_eof() will then print an error message on the > > console due to missing reply to NBD_CMD_DISC. > > > > This can be seen with qemu-img > > > >$ qemu-img info \ > >--object tls-creds-x509,dir=tlsdir,id=tls0,endpoint=client \ > >--image-opts driver=nbd,host=127.0.0.1,port=9000,tls-creds=tls0 > >qemu-img: Cannot read from TLS channel: Input/output error > >image: nbd://127.0.0.1:9000 > >file format: nbd > >virtual size: 10M (10485760 bytes) > >disk size: unavailable > > > > Simply setting the 'quit' flag after sending NBD_CMD_DISC is enough to > > get the coroutine to stop waiting for a reply and thus supress the error > > message. > > Actually, it's not quite enough - once you actually start performing I/O, > enough coroutines are kicked off that the error still happens: > > $ qemu-io -c 'r 1m 1m' -c 'w -P 0x22 1m 1m' --image-opts \ > --object tls-creds-x509,dir=scratch/tls/client1,endpoint=client,id=tls0\ > driver=nbd,host=localhost,port=10809,tls-creds=tls0 > read 1048576/1048576 bytes at offset 1048576 > 1 MiB, 1 ops; 0.0430 sec (23.204 MiB/sec and 23.2040 ops/sec) > wrote 1048576/1048576 bytes at offset 1048576 > 1 MiB, 1 ops; 0.0152 sec (65.479 MiB/sec and 65.4793 ops/sec) > Cannot read from TLS channel: Input/output error > > Squashing this in on top of your patch helps, though: > > diff --git i/block/nbd-client.c w/block/nbd-client.c > index 5f63e4b8f15..e7916c78996 100644 > --- i/block/nbd-client.c > +++ w/block/nbd-client.c > @@ -79,7 +79,14 @@ static coroutine_fn void nbd_read_reply_entry(void > *opaque) > assert(s->reply.handle == 0); > ret = nbd_receive_reply(s->ioc, >reply, _err); > if (local_err) { > -error_report_err(local_err); > +/* If we are already quitting, either another error has > + * already been reported, or we requested NBD_CMD_DISC and > + * don't need to report anything further. */ > +if (!s->quit) { > +error_report_err(local_err); > +} else { > +error_free(local_err); > +} > } > if (ret <= 0) { > break; > > But I want to do more testing to make sure I'm not missing out on reporting > an actual error if I add that. It occurred to me that it would be nice if the TLS channel behaved the same a normal channel when seeing EOF on the socket. Unfortunately the reason why GNUTLS returns an error in this case does in fact make sense, so it isn't quite as simple as just ignoring the error. Fortunately the NBD client has decided it wants to shutdown and crucially has called qio_channel_shutdown() at this point. Thus we can safely ignore the gnutls error code when the channel has been shutdown for read and just return 0 for EOF as with a plain socket. Thus I've sent a patch which solves the problem in that way: https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg03616.html With this applied, both my original patch and your extra chunk here should be redundant. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [Qemu-block] [PATCH 2/6 for-3.1] nbd: stop waiting for a NBD response with NBD_CMD_DISC
On Sat, Nov 17, 2018 at 08:19:10PM -0600, Eric Blake wrote: > On 11/16/18 9:53 AM, Daniel P. Berrangé wrote: > > When sending a NBD_CMD_DISC message there is no reply expected, > > however, the nbd_read_eof() coroutine is still waiting for a reply. > > In a plain NBD connection this doesn't matter as it will just get an > > EOF, however, on a TLS connection it will get an interrupted TLS data > > packet. The nbd_read_eof() will then print an error message on the > > console due to missing reply to NBD_CMD_DISC. > > > > This can be seen with qemu-img > > > >$ qemu-img info \ > >--object tls-creds-x509,dir=tlsdir,id=tls0,endpoint=client \ > >--image-opts driver=nbd,host=127.0.0.1,port=9000,tls-creds=tls0 > >qemu-img: Cannot read from TLS channel: Input/output error > >image: nbd://127.0.0.1:9000 > >file format: nbd > >virtual size: 10M (10485760 bytes) > >disk size: unavailable > > > > Simply setting the 'quit' flag after sending NBD_CMD_DISC is enough to > > get the coroutine to stop waiting for a reply and thus supress the error > > message. > > Actually, it's not quite enough - once you actually start performing I/O, > enough coroutines are kicked off that the error still happens: Hmm, does the client not send requests synchronously ? I expected that any other I/O operations would have already received their reply by the time we sent the DISC command. > > $ qemu-io -c 'r 1m 1m' -c 'w -P 0x22 1m 1m' --image-opts \ > --object tls-creds-x509,dir=scratch/tls/client1,endpoint=client,id=tls0\ > driver=nbd,host=localhost,port=10809,tls-creds=tls0 > read 1048576/1048576 bytes at offset 1048576 > 1 MiB, 1 ops; 0.0430 sec (23.204 MiB/sec and 23.2040 ops/sec) > wrote 1048576/1048576 bytes at offset 1048576 > 1 MiB, 1 ops; 0.0152 sec (65.479 MiB/sec and 65.4793 ops/sec) > Cannot read from TLS channel: Input/output error > > Squashing this in on top of your patch helps, though: > > diff --git i/block/nbd-client.c w/block/nbd-client.c > index 5f63e4b8f15..e7916c78996 100644 > --- i/block/nbd-client.c > +++ w/block/nbd-client.c > @@ -79,7 +79,14 @@ static coroutine_fn void nbd_read_reply_entry(void > *opaque) > assert(s->reply.handle == 0); > ret = nbd_receive_reply(s->ioc, >reply, _err); > if (local_err) { > -error_report_err(local_err); > +/* If we are already quitting, either another error has > + * already been reported, or we requested NBD_CMD_DISC and > + * don't need to report anything further. */ > +if (!s->quit) { > +error_report_err(local_err); > +} else { > +error_free(local_err); > +} > } > if (ret <= 0) { > break; > > But I want to do more testing to make sure I'm not missing out on reporting > an actual error if I add that. Yes, I'd be a little concerned about missing reporting of an error from a command other than NBD_CMD_DISC if we did this. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [Qemu-block] [PATCH 2/6 for-3.1] nbd: stop waiting for a NBD response with NBD_CMD_DISC
On 11/16/18 9:53 AM, Daniel P. Berrangé wrote: When sending a NBD_CMD_DISC message there is no reply expected, however, the nbd_read_eof() coroutine is still waiting for a reply. In a plain NBD connection this doesn't matter as it will just get an EOF, however, on a TLS connection it will get an interrupted TLS data packet. The nbd_read_eof() will then print an error message on the console due to missing reply to NBD_CMD_DISC. This can be seen with qemu-img $ qemu-img info \ --object tls-creds-x509,dir=tlsdir,id=tls0,endpoint=client \ --image-opts driver=nbd,host=127.0.0.1,port=9000,tls-creds=tls0 qemu-img: Cannot read from TLS channel: Input/output error image: nbd://127.0.0.1:9000 file format: nbd virtual size: 10M (10485760 bytes) disk size: unavailable Simply setting the 'quit' flag after sending NBD_CMD_DISC is enough to get the coroutine to stop waiting for a reply and thus supress the error message. Actually, it's not quite enough - once you actually start performing I/O, enough coroutines are kicked off that the error still happens: $ qemu-io -c 'r 1m 1m' -c 'w -P 0x22 1m 1m' --image-opts \ --object tls-creds-x509,dir=scratch/tls/client1,endpoint=client,id=tls0\ driver=nbd,host=localhost,port=10809,tls-creds=tls0 read 1048576/1048576 bytes at offset 1048576 1 MiB, 1 ops; 0.0430 sec (23.204 MiB/sec and 23.2040 ops/sec) wrote 1048576/1048576 bytes at offset 1048576 1 MiB, 1 ops; 0.0152 sec (65.479 MiB/sec and 65.4793 ops/sec) Cannot read from TLS channel: Input/output error Squashing this in on top of your patch helps, though: diff --git i/block/nbd-client.c w/block/nbd-client.c index 5f63e4b8f15..e7916c78996 100644 --- i/block/nbd-client.c +++ w/block/nbd-client.c @@ -79,7 +79,14 @@ static coroutine_fn void nbd_read_reply_entry(void *opaque) assert(s->reply.handle == 0); ret = nbd_receive_reply(s->ioc, >reply, _err); if (local_err) { -error_report_err(local_err); +/* If we are already quitting, either another error has + * already been reported, or we requested NBD_CMD_DISC and + * don't need to report anything further. */ +if (!s->quit) { +error_report_err(local_err); +} else { +error_free(local_err); +} } if (ret <= 0) { break; But I want to do more testing to make sure I'm not missing out on reporting an actual error if I add that. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-block] [PATCH 2/6 for-3.1] nbd: stop waiting for a NBD response with NBD_CMD_DISC
On 11/16/18 9:53 AM, Daniel P. Berrangé wrote: When sending a NBD_CMD_DISC message there is no reply expected, however, the nbd_read_eof() coroutine is still waiting for a reply. In a plain NBD connection this doesn't matter as it will just get an EOF, however, on a TLS connection it will get an interrupted TLS data packet. Does that depend on whether the server tried to gracefully close the TLS connection (and flushed enough packets to do so)? At any rate, while it's nice to wait for the server to cleanly end TLS, we can't control the behavior of all servers, and I agree with your analysis that having the server silently deal with an abrupt shutdown is nicer than polluting stderr, as it's not actually a data loss situation for either the client or the server. The nbd_read_eof() will then print an error message on the console due to missing reply to NBD_CMD_DISC. This can be seen with qemu-img $ qemu-img info \ --object tls-creds-x509,dir=tlsdir,id=tls0,endpoint=client \ --image-opts driver=nbd,host=127.0.0.1,port=9000,tls-creds=tls0 qemu-img: Cannot read from TLS channel: Input/output error image: nbd://127.0.0.1:9000 file format: nbd virtual size: 10M (10485760 bytes) disk size: unavailable Simply setting the 'quit' flag after sending NBD_CMD_DISC is enough to get the coroutine to stop waiting for a reply and thus supress the error s/supress/suppress/ message. This fixes a regression introduced in commit be41c100c0d67f6072ddd4910c4b6f7d239f025f Author: Vladimir Sementsov-Ogievskiy Date: Fri May 26 14:09:13 2017 +0300 nbd/client.c: use errp instead of LOG Move to modern errp scheme from just LOGging errors. Signed-off-by: Vladimir Sementsov-Ogievskiy Message-Id: <20170526110913.89098-1-vsement...@virtuozzo.com> Signed-off-by: Paolo Bonzini Signed-off-by: Daniel P. Berrangé --- block/nbd-client.c | 1 + 1 file changed, 1 insertion(+) Reviewed-by: Eric Blake Agree that this is 3.1 material; will queue. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
[Qemu-block] [PATCH 2/6 for-3.1] nbd: stop waiting for a NBD response with NBD_CMD_DISC
When sending a NBD_CMD_DISC message there is no reply expected, however, the nbd_read_eof() coroutine is still waiting for a reply. In a plain NBD connection this doesn't matter as it will just get an EOF, however, on a TLS connection it will get an interrupted TLS data packet. The nbd_read_eof() will then print an error message on the console due to missing reply to NBD_CMD_DISC. This can be seen with qemu-img $ qemu-img info \ --object tls-creds-x509,dir=tlsdir,id=tls0,endpoint=client \ --image-opts driver=nbd,host=127.0.0.1,port=9000,tls-creds=tls0 qemu-img: Cannot read from TLS channel: Input/output error image: nbd://127.0.0.1:9000 file format: nbd virtual size: 10M (10485760 bytes) disk size: unavailable Simply setting the 'quit' flag after sending NBD_CMD_DISC is enough to get the coroutine to stop waiting for a reply and thus supress the error message. This fixes a regression introduced in commit be41c100c0d67f6072ddd4910c4b6f7d239f025f Author: Vladimir Sementsov-Ogievskiy Date: Fri May 26 14:09:13 2017 +0300 nbd/client.c: use errp instead of LOG Move to modern errp scheme from just LOGging errors. Signed-off-by: Vladimir Sementsov-Ogievskiy Message-Id: <20170526110913.89098-1-vsement...@virtuozzo.com> Signed-off-by: Paolo Bonzini Signed-off-by: Daniel P. Berrangé --- block/nbd-client.c | 1 + 1 file changed, 1 insertion(+) diff --git a/block/nbd-client.c b/block/nbd-client.c index 76e9ca3abe..5f63e4b8f1 100644 --- a/block/nbd-client.c +++ b/block/nbd-client.c @@ -961,6 +961,7 @@ void nbd_client_close(BlockDriverState *bs) } nbd_send_request(client->ioc, ); +client->quit = true; nbd_teardown_connection(bs); } -- 2.19.1