On 08/08/2017 09:29 AM, Vladimir Sementsov-Ogievskiy wrote:
> Do not communicate after the first error to avoid communicating throught
> broken channel. The only exclusion is try to send NBD_CMD_DISC anyway on
> in nbd_client_close.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com>
> ---
> 
> Hi all. Here is a patch, fixing a problem noted in
> [PATCH 06/17] block/nbd-client: fix nbd_read_reply_entry
> and
> [PATCH 17/17] block/nbd-client: always return EIO on and after the first io 
> channel error
> and discussed on list.
> 
> If it will be applied to 2.10, then I'll rebase my 'nbd client refactoring 
> and fixing'
> on it (for 2.11). If not - I'll prefer not rebase the series, so, do not 
> apply this
> patch for 2.11.

It may be possible to do something even smaller:

> 
>  block/nbd-client.h |  1 +
>  block/nbd-client.c | 58 
> +++++++++++++++++++++++++++++++++++++-----------------
>  2 files changed, 41 insertions(+), 18 deletions(-)
> 
> diff --git a/block/nbd-client.h b/block/nbd-client.h
> index df80771357..28db9922c8 100644
> --- a/block/nbd-client.h
> +++ b/block/nbd-client.h
> @@ -29,6 +29,7 @@ typedef struct NBDClientSession {
>  
>      Coroutine *recv_coroutine[MAX_NBD_REQUESTS];
>      NBDReply reply;
> +    bool eio_to_all;
>  } NBDClientSession;
>  
>  NBDClientSession *nbd_get_client_session(BlockDriverState *bs);
> diff --git a/block/nbd-client.c b/block/nbd-client.c
> index 25dd28406b..1282b2484e 100644
> --- a/block/nbd-client.c
> +++ b/block/nbd-client.c
> @@ -49,6 +49,8 @@ static void nbd_teardown_connection(BlockDriverState *bs)
>  {
>      NBDClientSession *client = nbd_get_client_session(bs);
>  
> +    client->eio_to_all = true;
> +
>      if (!client->ioc) { /* Already closed */
>          return;
>      }
> @@ -74,12 +76,16 @@ static coroutine_fn void nbd_read_reply_entry(void 
> *opaque)
>      Error *local_err = NULL;
>  
>      for (;;) {
> +        if (s->eio_to_all) {
> +            break;
> +        }
> +

How is this conditional ever reached? nbd_read_reply_entry() is our
workhorse, that is supposed to run until the client is ready to disconnect.

>          assert(s->reply.handle == 0);
>          ret = nbd_receive_reply(s->ioc, &s->reply, &local_err);
>          if (ret < 0) {
>              error_report_err(local_err);
>          }
> -        if (ret <= 0) {
> +        if (ret <= 0 || s->eio_to_all) {
>              break;
>          }

This says we're now supposed to break out of the while loop.  So unless
something can set s->eio_to_all during
aio_co_wake(s->recv_coroutine[i]), the next iteration of the loop won't
hit the first conditional.

Meanwhile, even if we skip the first conditional, this second
conditional will still end the loop prior to acting on anything received
from the server (the difference being whether we called
nbd_receive_reply() one additional time, but I don't see that as getting
in the way of a clean exit).

But my question is whether we can just go with a simpler fix: if we ever
break out of the workhorse loop of nbd_read_reply_entry(), then (and
only then) is when we call nbd_recv_coroutines_enter_all() to mop up any
pending transactions before tearing down the coroutines.  Is there
something we can do in nbd_recv_coroutines_enter_all() that will
guarantee that our final entry into each coroutine will fail?

I'm still playing with the idea locally.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to