Re: [Qemu-devel] [Qemu-block] [PATCH 2/3] block: use BDRV_POLL_WHILE() in bdrv_rw_vmstate()

2017-05-18 Thread Kevin Wolf
Am 18.05.2017 um 10:06 hat Stefan Hajnoczi geschrieben:
> On Thu, May 18, 2017 at 8:57 AM, Kevin Wolf  wrote:
> > Am 17.05.2017 um 22:16 hat Eric Blake geschrieben:
> >> On 05/17/2017 12:09 PM, Stefan Hajnoczi wrote:
> >> > Calling aio_poll() directly may have been fine previously, but this is
> >> > the future, man!
> >>
> >> lol
> >>
> >> >  The difference between an aio_poll() loop and
> >> > BDRV_POLL_WHILE() is that BDRV_POLL_WHILE() releases the AioContext
> >> > around aio_poll().
> >> >
> >> > This allows the IOThread to run fd handlers or BHs to complete the
> >> > request.  Failure to release the AioContext causes deadlocks.
> >> >
> >> > Using BDRV_POLL_WHILE() partially fixes a 'savevm' hang with -object
> >> > iothread.
> >>
> >> I'm surprised at how many separate hangs we actually had!
> >
> > How hard would it be to write some test cases for this? Dataplane has
> > a serious lack of automated testing.
> 
> And this hang doesn't even require in-flight guest I/O, so it would be
> easy to reproduce in qemu-iotests if we add an IOThread mode.

I don't think I would make it a separate mode (because people would run
only one or the other - when did you last run qcow2 v2 tests?), but just
add some test cases that make use of it during the normal -qcow2 or -raw
run.

Kevin



Re: [Qemu-devel] [Qemu-block] [PATCH 2/3] block: use BDRV_POLL_WHILE() in bdrv_rw_vmstate()

2017-05-18 Thread Stefan Hajnoczi
On Thu, May 18, 2017 at 8:57 AM, Kevin Wolf  wrote:
> Am 17.05.2017 um 22:16 hat Eric Blake geschrieben:
>> On 05/17/2017 12:09 PM, Stefan Hajnoczi wrote:
>> > Calling aio_poll() directly may have been fine previously, but this is
>> > the future, man!
>>
>> lol
>>
>> >  The difference between an aio_poll() loop and
>> > BDRV_POLL_WHILE() is that BDRV_POLL_WHILE() releases the AioContext
>> > around aio_poll().
>> >
>> > This allows the IOThread to run fd handlers or BHs to complete the
>> > request.  Failure to release the AioContext causes deadlocks.
>> >
>> > Using BDRV_POLL_WHILE() partially fixes a 'savevm' hang with -object
>> > iothread.
>>
>> I'm surprised at how many separate hangs we actually had!
>
> How hard would it be to write some test cases for this? Dataplane has
> a serious lack of automated testing.

And this hang doesn't even require in-flight guest I/O, so it would be
easy to reproduce in qemu-iotests if we add an IOThread mode.

Stefan



Re: [Qemu-devel] [Qemu-block] [PATCH 2/3] block: use BDRV_POLL_WHILE() in bdrv_rw_vmstate()

2017-05-18 Thread Stefan Hajnoczi
On Wed, May 17, 2017 at 9:16 PM, Eric Blake  wrote:
> On 05/17/2017 12:09 PM, Stefan Hajnoczi wrote:
>> diff --git a/block/io.c b/block/io.c
>> index cc56e90..f0041cd 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -2031,9 +2031,7 @@ bdrv_rw_vmstate(BlockDriverState *bs, QEMUIOVector 
>> *qiov, int64_t pos,
>>  Coroutine *co = qemu_coroutine_create(bdrv_co_rw_vmstate_entry, 
>> &data);
>>
>>  bdrv_coroutine_enter(bs, co);
>> -while (data.ret == -EINPROGRESS) {
>> -aio_poll(bdrv_get_aio_context(bs), true);
>> -}
>> +BDRV_POLL_WHILE(bs, data.ret == -EINPROGRESS);
>>  return data.ret;
>>  }
>
> Do we have other culprits (not necessarily for vmsave, but for other
> situations), where we should be using BDRV_POLL_WHILE in separate
> patches? For example, a quick grep show that at least hw/9pfs/9p.c makes
> a direct call to aio_poll(,true) in a while loop.

virtio-9p doesn't use IOThread (dataplane) so it is not affected.  It
also doesn't use BlockDriverState so it cannot use BDRV_POLL_WHILE().

I did grep for other aio_poll() callers and didn't spot any obvious
cases that need to be fixed.  They tend to run in situations where
this deadlock cannot occur.

Stefan