Am 24.02.2018 um 00:51 hat John Snow geschrieben: > Presently, even if a job is canceled post-completion as a result of > a failing peer in a transaction, it will still call .commit because > nothing has updated or changed its return code. > > The reason why this does not cause problems currently is because > backup's implementation of .commit checks for cancellation itself. > > I'd like to simplify this contract: > > (1) Abort is called if the job/transaction fails > (2) Commit is called if the job/transaction succeeds > > To this end: A job's return code, if 0, will be forcibly set as > -ECANCELED if that job has already concluded. Remove the now > redundant check in the backup job implementation.
Thanks, another fix that addresses a comment I made on an earlier patch! > We need to check for cancellation in both block_job_completed > AND block_job_completed_single, because jobs may be cancelled between > those two calls; for instance in transactions. > > The check in block_job_completed could be removed, but there's no > point in starting to attempt to succeed a transaction that we know > in advance will fail. > > This does NOT affect mirror jobs that are "canceled" during their > synchronous phase. The mirror job itself forcibly sets the canceled > property to false prior to ceding control, so such cases will invoke > the "commit" callback. Which is a sign that mirror is abusing the interface. But yes, we need to keep it working. > Signed-off-by: John Snow <js...@redhat.com> > --- > block/backup.c | 2 +- > block/trace-events | 1 + > blockjob.c | 19 +++++++++++++++---- > 3 files changed, 17 insertions(+), 5 deletions(-) > > diff --git a/block/backup.c b/block/backup.c > index 7e254dabff..453cd62c24 100644 > --- a/block/backup.c > +++ b/block/backup.c > @@ -206,7 +206,7 @@ static void backup_cleanup_sync_bitmap(BackupBlockJob > *job, int ret) > BdrvDirtyBitmap *bm; > BlockDriverState *bs = blk_bs(job->common.blk); > > - if (ret < 0 || block_job_is_cancelled(&job->common)) { > + if (ret < 0) { > /* Merge the successor back into the parent, delete nothing. */ > bm = bdrv_reclaim_dirty_bitmap(bs, job->sync_bitmap, NULL); > assert(bm); > diff --git a/block/trace-events b/block/trace-events > index 266afd9e99..5e531e0310 100644 > --- a/block/trace-events > +++ b/block/trace-events > @@ -5,6 +5,7 @@ bdrv_open_common(void *bs, const char *filename, int flags, > const char *format_n > bdrv_lock_medium(void *bs, bool locked) "bs %p locked %d" > > # blockjob.c > +block_job_completed(void *job, int ret, int jret) "job %p ret %d corrected > ret %d" > block_job_state_transition(void *job, int ret, const char *legal, const > char *s0, const char *s1) "job %p (ret: %d) attempting %s transition > (%s-->%s)" > block_job_apply_verb(void *job, const char *state, const char *verb, const > char *legal) "job %p in state %s; applying verb %s (%s)" > > diff --git a/blockjob.c b/blockjob.c > index 4d29391673..ef17dea004 100644 > --- a/blockjob.c > +++ b/blockjob.c > @@ -384,13 +384,22 @@ void block_job_start(BlockJob *job) > bdrv_coroutine_enter(blk_bs(job->blk), job->co); > } > > +static void block_job_update_rc(BlockJob *job) > +{ > + if (!job->ret && block_job_is_cancelled(job)) { > + job->ret = -ECANCELED; > + } > + if (job->ret) { > + block_job_state_transition(job, BLOCK_JOB_STATUS_ABORTING); > + } > +} > + > static void block_job_completed_single(BlockJob *job) > { > assert(job->completed); > > - if (job->ret || block_job_is_cancelled(job)) { > - block_job_state_transition(job, BLOCK_JOB_STATUS_ABORTING); > - } > + /* Ensure abort is called for late-transactional failures */ > + block_job_update_rc(job); > > if (!job->ret) { > if (job->driver->commit) { > @@ -898,7 +907,9 @@ void block_job_completed(BlockJob *job, int ret) > assert(blk_bs(job->blk)->job == job); > job->completed = true; > job->ret = ret; > - if (ret < 0 || block_job_is_cancelled(job)) { > + block_job_update_rc(job); Hmm... We are transitioning to ABORTED here now. This means that in block_job_completed_single() we get an ABORTED -> ABORTED transition. A bit ugly, but not really a problem. If you decide to remove the s0 == s1 special case in block_job_state_transition(), you'll need to allow this transition in the table. Reviewed-by: Kevin Wolf <kw...@redhat.com>