On Wed, Nov 02, 2016 at 01:50:52PM -0400, John Snow wrote: > Cleaning up after we have deferred to the main thread but before the > transaction has converged can be dangerous and result in deadlocks > if the job cleanup invokes any BH polling loops. > > A job may attempt to begin cleaning up, but may induce another job to > enter its cleanup routine. The second job, part of our same transaction, > will block waiting for the first job to finish, so neither job may now > make progress. > > To rectify this, allow jobs to register a cleanup operation that will > always run regardless of if the job was in a transaction or not, and > if the transaction job group completed successfully or not. > > Move sensitive cleanup to this callback instead which is guaranteed to > be run only after the transaction has converged, which removes sensitive > timing constraints from said cleanup. > > Furthermore, in future patches these cleanup operations will be performed > regardless of whether or not we actually started the job. Therefore, > cleanup callbacks should essentially confine themselves to undoing create > operations, e.g. setup actions taken in what is now backup_start. > > Reported-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com> > Signed-off-by: John Snow <js...@redhat.com> > --- > block/backup.c | 15 ++++++++++----- > blockjob.c | 3 +++ > include/block/blockjob_int.h | 8 ++++++++ > 3 files changed, 21 insertions(+), 5 deletions(-) > > diff --git a/block/backup.c b/block/backup.c > index 7b5d8a3..734a24c 100644 > --- a/block/backup.c > +++ b/block/backup.c > @@ -242,6 +242,14 @@ static void backup_abort(BlockJob *job) > } > } > > +static void backup_clean(BlockJob *job) > +{ > + BackupBlockJob *s = container_of(job, BackupBlockJob, common); > + assert(s->target); > + blk_unref(s->target); > + s->target = NULL; > +} > + > static void backup_attached_aio_context(BlockJob *job, AioContext > *aio_context) > { > BackupBlockJob *s = container_of(job, BackupBlockJob, common); > @@ -321,6 +329,7 @@ static const BlockJobDriver backup_job_driver = { > .set_speed = backup_set_speed, > .commit = backup_commit, > .abort = backup_abort, > + .clean = backup_clean, > .attached_aio_context = backup_attached_aio_context, > .drain = backup_drain, > }; > @@ -343,12 +352,8 @@ typedef struct { > > static void backup_complete(BlockJob *job, void *opaque) > { > - BackupBlockJob *s = container_of(job, BackupBlockJob, common); > BackupCompleteData *data = opaque; > > - blk_unref(s->target); > - s->target = NULL; > - > block_job_completed(job, data->ret); > g_free(data); > } > @@ -658,7 +663,7 @@ void backup_start(const char *job_id, BlockDriverState > *bs, > bdrv_reclaim_dirty_bitmap(bs, sync_bitmap, NULL); > } > if (job) { > - blk_unref(job->target); > + backup_clean(&job->common); > block_job_unref(&job->common); > } > } > diff --git a/blockjob.c b/blockjob.c > index 4d0ef53..e3c458c 100644 > --- a/blockjob.c > +++ b/blockjob.c > @@ -241,6 +241,9 @@ static void block_job_completed_single(BlockJob *job) > job->driver->abort(job); > } > } > + if (job->driver->clean) { > + job->driver->clean(job); > + } > > if (job->cb) { > job->cb(job->opaque, job->ret); > diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h > index 40275e4..60d91a0 100644 > --- a/include/block/blockjob_int.h > +++ b/include/block/blockjob_int.h > @@ -74,6 +74,14 @@ struct BlockJobDriver { > void (*abort)(BlockJob *job); > > /** > + * If the callback is not NULL, it will be invoked after a call to either > + * .commit() or .abort(). Regardless of which callback is invoked after > + * completion, .clean() will always be called, even if the job does not > + * belong to a transaction group. > + */ > + void (*clean)(BlockJob *job); > + > + /** > * If the callback is not NULL, it will be invoked when the job > transitions > * into the paused state. Paused jobs must not perform any asynchronous > * I/O or event loop activity. This callback is used to quiesce jobs. > -- > 2.7.4 >
Reviewed-by: Jeff Cody <jc...@redhat.com>