Re: [PATCH for-6.0? 1/3] job: Add job_wait_unpaused() for block-job-complete

Max Reitz Fri, 09 Apr 2021 03:19:37 -0700

On 09.04.21 12:07, Vladimir Sementsov-Ogievskiy wrote:

09.04.2021 12:51, Max Reitz wrote:
On 08.04.21 19:26, Vladimir Sementsov-Ogievskiy wrote:
08.04.2021 20:04, John Snow wrote:
On 4/8/21 12:58 PM, Vladimir Sementsov-Ogievskiy wrote:
job-complete command is async. Can we instead just add a booleanlike job->completion_requested, and set it if job-complete calledin STANDBY state, and on job_resume job_complete will be calledautomatically if this boolean is true?
job_complete has a synchronous setup, though -- we lose out on a lotof synchronous error checking in that circumstance.
yes, that's a problem..
I was not able to audit it to determine that it'd be safe to attemptthat setup during a drained section -- I imagine it won't work andwill fail, though.
So I thought we'd have to signal completion and run the setup*later*, but what do we do if we get an error then? Does the entirejob fail? Do we emit some new event? ("BLOCK_JOB_COMPLETION_FAILED"?) Is it recoverable?
Isn't it possible even now, that after successful job-complete jobstill fails and we report BLOCK_JOB_COMPLETED with error?
And actually, how much benefit user get from the fact thatjob-complete may fail?
We can make job-complete a simple always-success boolean flag setterlike job-pause.
I wanted to say the following:

  But job-pause does always succeed, in contrast to block-job-complete.

  block-job-complete is more akin to job-finalize, which too is a
  synchronous operation.
But when I wrote that last sentence, I asked myself whether whatmirror_complete() does isn’t actually a remnant of what we had to dowhen we didn’t have job-finalize yet. Shouldn’t that all be inmirror_exit_common()? What’s the advantage of opening the backingchain or putting blockers on the to-replace node inblock-job-complete? Aren’t that all graph-changing operation,basically, i.e. stuff that should be done in job-finalize?
If we move everything to mirror_exit_common(), all that remains to dois basically set some should_complete flag (could even be part of theJob struct), and then the whole problem disappears.
Thoughts?
Sounds good.. ButI want to understand first one simple thing: can jobfail even after block-job-complete succeeded?


Sure, if you get an I/O error afterwards.

As I understand current users think that it can't. Andblock-job-complete is documented as "This command completes an activebackground block operation synchronously". So it's assumed that ifblock-job-complete succeeded we are totally done.

I think the only thing that block-job-complete does is signal to the jobit should exit once source and target have converged again. (The READYevent just says that source and target have converged once already.)

(Only in write-blocking copy mode is there a guarantee of source andtarget remaining converged after READY.)

Well, and of course mirror_complete() also does a couple of stuff thatprepares replacing the source by the target.

But maybe, it's wrong? Can mirror_prepare fail after mirror_completesuccess?

Oh definitely. For example, mirror_prepare replaces the source by thetarget, which can definitely fail. (See mirror_exit_common().)

And user must check job status after job is finalized? Or checkerror in BLOCK_JOB_COMPLETED event?

If the BLOCK_JOB_COMPLETED event shows an error, then the job doesn’teven try to complete. If there is an error on job-finalize, source andtarget have converged (so the target is consistent), but the source mostlikely couldn’t be replaced by the target.

I suppose in practice if anything goes wrong libvirt just shows an errorand that’s it. No matter where the error occurs exactly.

Max

Re: [PATCH for-6.0? 1/3] job: Add job_wait_unpaused() for block-job-complete

Reply via email to