On 19/04/2016 16:31, Stefan Hajnoczi wrote:
> On Fri, Apr 15, 2016 at 01:31:59PM +0200, Paolo Bonzini wrote:
>> @@ -255,6 +257,8 @@ aio_ctx_finalize(GSource     *source)
>>      }
>>  #endif
>>  
>> +    qemu_bh_delete(ctx->schedule_bh);
> 
> Please include an assertion that the scheduled coroutines list is empty.

Good idea.

>> +    while (!QSLIST_EMPTY(&straight)) {
>> +        Coroutine *co = QSLIST_FIRST(&straight);
>> +        QSLIST_REMOVE_HEAD(&straight, co_scheduled_next);
>> +        trace_aio_schedule_bh_cb(ctx, co);
>> +        qemu_coroutine_enter(co, NULL);

FWIW, this should be wrapped with aio_context_acquire/aio_context_release.

>> +    }
>> +}
> 
> This construct brings to mind the use-after-free case when a scheduled
> coroutine terminates before it is entered by this loop:
> 
> There are two scheduled Coroutines: A and B.  During
> qemu_coroutine_enter(A) we enter B.  B then terminates by returning from
> its main function.  Once A yields or terminates we still try to enter
> the freed B coroutine.
> 
> Unfortunately I don't think we have good debugging or an assertion for
> this bug.  I'm sure it will occur at some point...

aio_co_schedule (and qemu_coroutine_wake which wraps it later in the
series) is quite a low-level interface, so I do not expect many users.

That said, there is at least another case where it will be used.  In the
dataplane branch, where AIO callbacks take the AioContext mutex
themselves, we have:

static void bdrv_co_io_em_complete(void *opaque, int ret)
{
    CoroutineIOCompletion *co = opaque;

    co->ret = ret;
    aio_context_acquire(co->ctx);
    qemu_coroutine_enter(co->coroutine, NULL);
    aio_context_release(co->ctx);
}

...

    acb = bs->drv->bdrv_aio_readv(bs, sector_num, iov, nb_sectors,
                                  bdrv_co_io_em_complete, &co);
    qemu_coroutine_yield();

bdrv_co_io_em_complete here can be called before the coroutine has
yielded.  To prepare for the replacement of the AioContext mutex with
fine-grained mutexes, I think bdrv_co_io_em_complete should do something
like

    if (ctx != qemu_get_current_aio_context()) {
        aio_co_schedule(ctx, co->coroutine);
        return;
    }

    aio_context_acquire(ctx);
    qemu_coroutine_enter(co->coroutine, NULL);
    aio_context_release(ctx);


> Please document
> that the coroutine must not be entered by anyone else while
> aio_co_schedule() is active.

Sure.

Paolo

Reply via email to