On 2018-05-15 14:17, Kevin Wolf wrote: > Am 14.05.2018 um 17:52 hat Max Reitz geschrieben: >> On 2018-05-09 18:26, Kevin Wolf wrote: >>> Signed-off-by: Kevin Wolf <[email protected]> >>> --- >>> include/block/blockjob.h | 5 ---- >>> include/block/blockjob_int.h | 19 --------------- >>> include/qemu/job.h | 20 ++++++++++++++++ >>> block/backup.c | 7 +++--- >>> block/commit.c | 11 +++++---- >>> block/mirror.c | 15 ++++++------ >>> block/stream.c | 14 +++++------ >>> blockjob.c | 57 >>> ++++---------------------------------------- >>> job.c | 33 +++++++++++++++++++++++++ >>> tests/test-bdrv-drain.c | 7 +++--- >>> tests/test-blockjob-txn.c | 13 +++++----- >>> tests/test-blockjob.c | 7 +++--- >>> 12 files changed, 98 insertions(+), 110 deletions(-) >> >> [...] >> >>> diff --git a/block/commit.c b/block/commit.c >>> index 85baea8f92..d326766e4d 100644 >>> --- a/block/commit.c >>> +++ b/block/commit.c >>> @@ -72,9 +72,10 @@ typedef struct { >>> int ret; >>> } CommitCompleteData; >>> >>> -static void commit_complete(BlockJob *job, void *opaque) >>> +static void commit_complete(Job *job, void *opaque) >>> { >>> - CommitBlockJob *s = container_of(job, CommitBlockJob, common); >>> + CommitBlockJob *s = container_of(job, CommitBlockJob, common.job); >> >> Now this is just abuse. >> >> ...but it's not the first time someone packs two container_of() into >> one, it appears. So, whatever, I guess. > > I don't think it's abuse. Why wouldn't I directly cast to the type that > I really want instead of casting to each of the uninteresting parent > classes, too?
Because the final parameter is called "member" and not "path". :-)
>>> @@ -170,3 +171,35 @@ void job_unref(Job *job)
>>> g_free(job);
>>> }
>>> }
>>> +
>>> +typedef struct {
>>> + Job *job;
>>> + JobDeferToMainLoopFn *fn;
>>> + void *opaque;
>>> +} JobDeferToMainLoopData;
>>> +
>>> +static void job_defer_to_main_loop_bh(void *opaque)
>>> +{
>>> + JobDeferToMainLoopData *data = opaque;
>>> + Job *job = data->job;
>>> + AioContext *aio_context = job->aio_context;
>>> +
>>> + /* Prevent race with job_defer_to_main_loop() */
>>> + aio_context_acquire(aio_context);
>>
>> I don't have a good feeling about this. The original code had this
>> comment above an aio_context_acquire() for a context that might
>> decidedly not have anything to do with the BB's context;
>> block_job_defer_to_main_loop()'s description was that it would acquire
>> the latter, so why did it acquire the former at all?
>>
>> We wouldn't need this comment here at all, because acquiring this
>> AioContext is part of the interface. That's why I don't have a good
>> feeling.
>
> To be honest, I don't fully understand what the old code was trying to
> do or what race it was talking about, because I don't see any potential
> race with job_defer_to_main_loop() (neither in the old nor in the new
> code).
>
> My suspicion is that Stefan solved the race that you reported [1] (and
> which was not with job_defer_to_main_loop(), but with random code that
> runs between that and the BH) just by adding some more code that made
> the scenario safe, but missed that this also made the existing code
> redundant. In other words, I think taking data->aio_context didn't serve
> a purpose even in the old code.
Possible, yes.
Also seems more likely than any of the explanations I tried to come up with.
> The only thing it could possibly protect is the access of data->job->bs,
> but that's not something that changes between job_defer_to_main_loop()
> and the execution of the BH.
>
> [1] https://lists.gnu.org/archive/html/qemu-devel/2014-10/msg00260.html
>
>> The best explanation I can come up with is that the original code
>> acquired the AioContext both of the block device at the time of the BH
>> (because that needs to be done), and at the time of
>> block_job_defer_to_main_loop() -- because the latter is probably the
>> context the block_job_defer_to_main_loop() call came from, so it should
>> be (b)locked.
>>
>> But if that's the case, then the same should be done here. The context
>> of the job may change between scheduling the BH and the BH being
>> executed, so we might lock a different context here than the one
>> job_defer_to_main_loop() ran in (i.e., job->aio_context at the time of
>> job_defer_to_main_loop() running). And maybe we should lock that old
>> context, too -- just like block_job_defer_to_main_loop_bh() did.
>
> Why should we lock the old context? We don't access anything protected
> by it. Even the data->job->bs access has gone away because we now have
> job->aio_context.
Because the old code did so and I don't know why. O:-)
Your explanation does make sense to me, though, so:
Reviewed-by: Max Reitz <[email protected]>
signature.asc
Description: OpenPGP digital signature
