Re: [PATCH 0/3] block/nbd: fix crashers in reconnect while migrating

2021-01-29 Thread Vladimir Sementsov-Ogievskiy

29.01.2021 10:35, Roman Kagan wrote:

On Fri, Jan 29, 2021 at 08:51:39AM +0300, Vladimir Sementsov-Ogievskiy wrote:

28.01.2021 23:14, Roman Kagan wrote:

During the final phase of migration the NBD reconnection logic may
encounter situations it doesn't expect during regular operation.

This series addresses some of them that make qemu crash.  They are
reproducible when a vm with a secondary drive attached via nbd with
non-zero "reconnect-delay" runs a stress load (fio with big queue depth)
in the guest on that drive and is migrated (e.g. to a file), while the
nbd server is SIGKILL-ed and restarted every second.

See the individual patches for specific crash conditions and more
detailed explanations.

Roman Kagan (3):
block/nbd: only detach existing iochannel from aio_context
block/nbd: only enter connection coroutine if it's present
nbd: make nbd_read* return -EIO on error

   include/block/nbd.h |  7 ---
   block/nbd.c | 25 +
   2 files changed, 21 insertions(+), 11 deletions(-)



Thanks a lot for fixing!

Do you have some reproducer scripts? Could you post them or may be add
an iotest?


I don't have it scripted, just ad hoc command lines.  I'll look into
making up a test.  Can you perhaps suggest what existing test to base
on?



For now reconnect feature is covered only by two tests tests/qemu-iotests/264 
and tests/qemu-iotests/277.

Also note, that since "f203080bbd iotests: rewrite check into python" you should add new 
iotests with human-readable file names into tests/qemu-iotests/tests subdirectory. Also you don't 
need to update tests/qemu-iotests/group file (it's absent now), test groups are defined in tests 
themselves in a comment, like "# group: rw quick".

--
Best regards,
Vladimir



Re: [PATCH 0/3] block/nbd: fix crashers in reconnect while migrating

2021-01-28 Thread Roman Kagan
On Fri, Jan 29, 2021 at 08:51:39AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 28.01.2021 23:14, Roman Kagan wrote:
> > During the final phase of migration the NBD reconnection logic may
> > encounter situations it doesn't expect during regular operation.
> > 
> > This series addresses some of them that make qemu crash.  They are
> > reproducible when a vm with a secondary drive attached via nbd with
> > non-zero "reconnect-delay" runs a stress load (fio with big queue depth)
> > in the guest on that drive and is migrated (e.g. to a file), while the
> > nbd server is SIGKILL-ed and restarted every second.
> > 
> > See the individual patches for specific crash conditions and more
> > detailed explanations.
> > 
> > Roman Kagan (3):
> >block/nbd: only detach existing iochannel from aio_context
> >block/nbd: only enter connection coroutine if it's present
> >nbd: make nbd_read* return -EIO on error
> > 
> >   include/block/nbd.h |  7 ---
> >   block/nbd.c | 25 +
> >   2 files changed, 21 insertions(+), 11 deletions(-)
> > 
> 
> Thanks a lot for fixing!
> 
> Do you have some reproducer scripts? Could you post them or may be add
> an iotest?

I don't have it scripted, just ad hoc command lines.  I'll look into
making up a test.  Can you perhaps suggest what existing test to base
on?

Thanks,
Roman.



Re: [PATCH 0/3] block/nbd: fix crashers in reconnect while migrating

2021-01-28 Thread Vladimir Sementsov-Ogievskiy

28.01.2021 23:14, Roman Kagan wrote:

During the final phase of migration the NBD reconnection logic may
encounter situations it doesn't expect during regular operation.

This series addresses some of them that make qemu crash.  They are
reproducible when a vm with a secondary drive attached via nbd with
non-zero "reconnect-delay" runs a stress load (fio with big queue depth)
in the guest on that drive and is migrated (e.g. to a file), while the
nbd server is SIGKILL-ed and restarted every second.

See the individual patches for specific crash conditions and more
detailed explanations.

Roman Kagan (3):
   block/nbd: only detach existing iochannel from aio_context
   block/nbd: only enter connection coroutine if it's present
   nbd: make nbd_read* return -EIO on error

  include/block/nbd.h |  7 ---
  block/nbd.c | 25 +
  2 files changed, 21 insertions(+), 11 deletions(-)



Thanks a lot for fixing!

Do you have some reproducer scripts? Could you post them or may be add an 
iotest?

--
Best regards,
Vladimir



[PATCH 0/3] block/nbd: fix crashers in reconnect while migrating

2021-01-28 Thread Roman Kagan
During the final phase of migration the NBD reconnection logic may
encounter situations it doesn't expect during regular operation.

This series addresses some of them that make qemu crash.  They are
reproducible when a vm with a secondary drive attached via nbd with
non-zero "reconnect-delay" runs a stress load (fio with big queue depth)
in the guest on that drive and is migrated (e.g. to a file), while the
nbd server is SIGKILL-ed and restarted every second.

See the individual patches for specific crash conditions and more
detailed explanations.

Roman Kagan (3):
  block/nbd: only detach existing iochannel from aio_context
  block/nbd: only enter connection coroutine if it's present
  nbd: make nbd_read* return -EIO on error

 include/block/nbd.h |  7 ---
 block/nbd.c | 25 +
 2 files changed, 21 insertions(+), 11 deletions(-)

-- 
2.29.2