Re: [Qemu-block] [Qemu-devel] [PATCH v2 0/4] blockjobs: add explicit job reaping

2017-10-06 Thread Kevin Wolf
Am 06.10.2017 um 05:56 hat John Snow geschrieben:
> On 10/05/2017 07:38 AM, Kevin Wolf wrote:
> > Let me try to just consolidate all of the above into a single state
> > machine:
> > 
> > 1.  CREATED --> RUNNING
> > driver callback: .start
> > 2a. RUNNING --> READY | CANCELLED
> > via: auto transition (when bulk copy is finished) / block-job-cancel
> > event: BLOCK_JOB_READY
> > 2b. READY --> READY (COMPLETING) | READY (CANCELLING)
> > via: block-job-complete / block-job-cancel
> > event: none
> > driver callback: .complete / none
> > 3.  READY (CANCELLING | COMPLETING) --> DONE
> > via: auto transition
> >  (CANCELLING: after draining in-flight mirror requests;
> >   COMPLETING: when images are in sync)
> > event: BLOCK_JOB_DONE
> 
> I have some doubts about "DONE" necessarily coming prior to "PENDING" as
> this means that the transaction gives up control of the jobs at this
> point, and then "PENDING" jobs may complete one without the other,
> especially if we introduce a PREPARE() callback.
> 
> (At least, if I've understood you correctly that "DONE" is the phase
> where jobs queue up, ready to be dispatched by the transaction mechanism.)

Yes. This means that DONE is state where a job end up when it has
completed its work, which is generally a different point in time for
each job in the transaction. Something has to come there, and it can't
be PENDING yet because the transaction hasn't completed yet.

In other words, DONE is the inactive state that exists today, but is
externally exposed as RUNNING even though the job isn't actually doing
any work any more.

I don't see though why this means that the transaction has to give up
control?

> I think jobs needs to not clear that transactional hurdle until they've
> been allowed to call prepare so that we can be guaranteed that any
> changes that occur after that point in time will not fail (and cannot
> any longer affect the transactional group.)

The earliest point where the transaction can be removed from the picture
is the first call of block-job-finalize for any job in the transaction.
This is where all jobs of the transaction need to complete at least
their .prepare stage, otherwise this first job can't be finalised.

As we discussed yesterday, block-job-finalize is really an operation on
the whole transaction, but keeping it at the job level in the external
interface spares us managing transactions as named objects.

Kevin



Re: [Qemu-block] [Qemu-devel] [PATCH v2 0/4] blockjobs: add explicit job reaping

2017-10-05 Thread Markus Armbruster
Quick drive-by comment:

Kevin Wolf  writes:

[...]
> Let me try to just consolidate all of the above into a single state
> machine:
>
> 1.  CREATED --> RUNNING
> driver callback: .start
> 2a. RUNNING --> READY | CANCELLED
> via: auto transition (when bulk copy is finished) / block-job-cancel
> event: BLOCK_JOB_READY
> 2b. READY --> READY (COMPLETING) | READY (CANCELLING)
> via: block-job-complete / block-job-cancel
> event: none
> driver callback: .complete / none
> 3.  READY (CANCELLING | COMPLETING) --> DONE
> via: auto transition
>  (CANCELLING: after draining in-flight mirror requests;
>   COMPLETING: when images are in sync)
> event: BLOCK_JOB_DONE
> 4.  DONE --> PENDING
> via: auto transition (all jobs in the transaction are DONE)
> event: BLOCK_JOB_PENDING
> 5.  PENDING --> FINISHED
> via: block-job-finalize
> event: COMPLETED | CANCELLED
> driver callback: .prepare_finalize / .commit / .abort
> 6.  FINISHED --> NULL
> via: block-job-reap
> event: none
> driver callback: .clean
>
> I removed COMPLETED/CANCELLED states because they are never externally
> visible. You proposed an "auto transition" there, but the transition
> would be immediately after the previous one, so clients always see
> PENDING --> NULL | FINISHED.
>
> We would have two booleans to make explicit transition automatically:
>
> auto-finalize for block-job-finalize (default: true)
> auto-reap for block-job-reap (default: true)

Are we *sure* we need to quadruple the test matrix?

What exactly is the use case for either of these two flags?

> Both of them would be executed automatically as soon as the respective
> commands would be available.
>
> We could add more auto-* options for the remaining explicit transition

*groan*

> (block-job-complete/cancel in READY), but these are not important for
> the problems we're trying to solve here. They might become interesting
> if we do decide that we want a single copy block job instead of doing
> similar things in mirror, commit and backup.
> The naming needs some improvements (done -> pending -> finished looks
> really odd), but does this make sense otherwise?
>
> Kevin



Re: [Qemu-block] [Qemu-devel] [PATCH v2 0/4] blockjobs: add explicit job reaping

2017-10-05 Thread John Snow


On 10/05/2017 07:38 AM, Kevin Wolf wrote:
> Am 05.10.2017 um 03:46 hat John Snow geschrieben:
>> On 10/04/2017 02:27 PM, Kevin Wolf wrote:
>>> Am 04.10.2017 um 03:52 hat John Snow geschrieben:
 For jobs that complete when a monitor isn't looking, there's no way to
 tell what the job's final return code was. We need to allow jobs to
 remain in the list until queried for reliable management.
>>>
>>> Just a short summary of what I discussed with John on IRC:
>>>
>>> Another important reason why we want to have an explicit end of block
>>> jobs is that job completion often makes changes to the graph. For a
>>> management tool that manages the block graph on a node level, it is a
>>> big problem if graph changes can happen at any point that can lead to
>>> bad race conditions. Giving the management tool control over the end of
>>> the block job makes it aware that graph changes happen.
>>>
>>> This means that compared to this RFC series, we need to move the waiting
>>> earlier in the process:
>>>
>>> 1. Block job is done and calls block_job_completed()
>>> 2. Wait for other block jobs in the same job transaction to complete
>>> 3. Send a (new) QMP event to the management tool to notify it that the
>>>job is ready to be reaped
>>
>> Oh, I suppose to distinguish it from "COMPLETED" in that sense, because
>> it isn't actually COMPLETED anymore under your vision, so it requires a
>> new event in this proposal.
>>
>> This becomes a bit messy, bumping up against both "READY" and a
>> transactional pre-completed state semantically. U, for lack of a
>> better word in the timeframe I'd like to complete this email in, let's
>> call this new theoretical state "PENDING"?
>>
>> So presently, a job goes through the following life cycle:
>>
>> 1. CREATED --> RUNNING
>> 2. RUNNING <--> PAUSED
>> 3. RUNNING --> (READY | COMPLETED | CANCELED)
>> 4. READY --> (COMPLETED | CANCELED)
>> 5. (COMPLETED | CANCELED) --> NULL
>>
>> Where we emit an event upon entering "READY", "COMPLETED" or "CANCELED".
> 
> Roughly yes, but it's not quite true because you can still pause and
> unpause ready jobs. So READY and PAUSED are kind of orthogonal.
> 
>> My patchset here effectively adds a new optional terminal state:
>>
>> 5. (COMPLETED | CANCELED) --> (NULL | FINISHED)
>> 6. FINISHED --> NULL
>>
>> Where the last transition from FINISHED to NULL is performed via
>> block-job-reap, but notably we get to re-use the events for COMPLETED |
>> CANCELED to indicate the availability of this operation to be performed.
>>
>> What happens in the case of transactionally managed jobs presently is
>> that jobs get stuck as they enter the COMPLETED|CANCELED state. If you
>> were to query them they behave as if they're RUNNING. There's no
>> discrete state that exists for this presently.
>>
>> You can cancel these as normal, but I'm not sure if you can pause them,
>> actually. (Note to self, test that.) I think they have almost exactly
>> like any RUNNING job would.
> 
> Except that they don't do any work any more. This is an mportant
> difference for a mirror job which would normally keep copying new writes
> until it sends the COMPLETED event. So when libvirt restarts and it sees
> a "RUNNING" mirror job, it can't decide whether it is still copying
> things or has already completed.
> 
> Looks like this is another reason why we want a separate state here.
> 
>> What you're proposing here is the formalization of the pre-completion
>> state ("PENDING") and that in this state, a job outside of a transaction
>> can exist until it is manually told to finally, once and for all,
>> actually finish its business. We can use this as a hook to perform and
>> last graph changes so they will not come as a surprise to the management
>> application. Maybe this operation should be called "Finalize". Again,
>> for lack of a better term in the timeframe, I'll refer to it as such for
>> now.
> 
> "finalize" doesn't sound too bad.
> 
>> I think importantly this actually distinguishes it from "reap" in that
>> the commit phase can still fail, so we can't let the job follow that
>> auto transition back to the NULL state.
> 
> Let me see if I understand correctly: We want to make sure that the
> management tool sees the final return value for the job. We have already
> decided that events aren't enough for this because the management tool
> could be restarted while we send the event, so the information is lost.
> Having it as a return value of block-job-reap isn't enough either
> because it could be lost the same way. We need a separate phase where
> libvirt can query the return value and from which we don't automatically
> transition away.
> 
> I'm afraid that you are right.
> 
>> That means that we'd need both a block-job-finalize AND a
>> block-job-reap to accomplish both of the following goals:
>>
>> (1) Allow the management application to control graph changes [libvirt]
>> (2) Prevent auto transitions to NULL state for asynchronous clients [A
>> requi