John Snow <js...@redhat.com> writes: > On 04/17/2018 09:44 AM, Markus Armbruster wrote: >> John Snow <js...@redhat.com> writes: >> >>> This series seeks to address two distinct but closely related issues >>> concerning the job management API. >>> >>> (1) For jobs that complete when a monitor is not attached and receiving >>> events or notifications, there's no way to discern the job's final >>> return code. Jobs must remain in the query list until dismissed >>> for reliable management. >>> >>> (2) Jobs that change the block graph structure at an indeterminate point >>> after the job starts compete with the management layer that relies >>> on that graph structure to issue meaningful commands. >>> >>> This structure should change only at the behest of the management >>> API, and not asynchronously at unknown points in time. Before a job >>> issues such changes, it must rely on explicit and synchronous >>> confirmation from the management API. >>> >>> These changes are implemented by formalizing a State Transition Machine >>> for the BlockJob subsystem. >>> >>> Job States: >>> >>> UNDEFINED Default state. Internal state only. >>> CREATED Job has been created >>> RUNNING Job has been started and is running >>> PAUSED Job is not ready and has been paused >>> READY Job is ready and is running >>> STANDBY Job is ready and is paused >>> >>> WAITING Job is waiting on peers in transaction >>> PENDING Job is waiting on ACK from QMP >>> ABORTING Job is aborting or has been cancelled >>> CONCLUDED Job has finished and has a retcode available >>> NULL Job is being dismantled. Internal state only. >>> >>> Job Verbs: >>> > > Backporting your quote up here: > >> For each job verb and job state: what's the new job state? >> > > That's not always 1:1, though I tried to address it in the commit messages.
Let me rephrase my question then. For each job verb and job state: what are the possible new job states? If there's more than one, what's the condition for each? I appreciate commit messages explaining that, but having complete state machine documentation in one place (a comment or in docs/) would be nice, wouldn't it? >>> CANCEL Instructs a running job to terminate with error, >>> (Except when that job is READY, which produces no error.) > > CANCEL will take a job to either NULL... (this is the early abort > pathway, prior to the job being fully realized.) > > ...or to ABORTING (from CREATED once it has fully realized the job, or > from RUNNING, READY, WAITING, or PENDING.) > >>> PAUSE Request a job to pause. > > issued to RUNNING or READY, transitions to PAUSED or STANDBY respectively. > >>> RESUME Request a job to resume from a pause. > > issued to PAUSED or STANDBY, transitions to RUNNING or READY respectively. > >>> SET-SPEED Change the speed limiting parameter of a job. > > No run state change. > >>> COMPLETE Ask a READY job to finish and exit. >>> > > Issued to a READY job, transitions to WAITING. > >>> FINALIZE Ask a PENDING job to perform its graph finalization. > > Issued to a PENDING job, transitions to CONCLUDED. > >>> DISMISS Finish cleaning up an empty job. >> > > Issued to a CONCLUDED job, transitions to NULL. > > >>> And here's my stab at a diagram: >>> >>> +---------+ >>> |UNDEFINED| >>> +--+------+ >>> | >>> +--v----+ >>> +---------+CREATED+-----------------+ >>> | +--+----+ | >>> | | | >>> | +--+----+ +------+ | >>> +---------+RUNNING<----->PAUSED| | >>> | +--+-+--+ +------+ | >>> | | | | >>> | | +------------------+ | >>> | | | | >>> | +--v--+ +-------+ | | >>> +---------+READY<------->STANDBY| | | >>> | +--+--+ +-------+ | | >>> | | | | >>> | +--v----+ | | >>> +---------+WAITING<---------------+ | >>> | +--+----+ | >>> | | | >>> | +--v----+ | >>> +---------+PENDING| | >>> | +--+----+ | >>> | | | >>> +--v-----+ +--v------+ | >>> |ABORTING+--->CONCLUDED| | >>> +--------+ +--+------+ | >>> | | >>> +--v-+ | >>> |NULL<--------------------+ >>> +----+ >> >> Is this diagram missing a few arrowheads? E.g. on the edge between >> RUNNING and WAITING. >> > > Apparently yes. :\ > > (Secretly fixed up in my reply.) > >> Might push the limits of ASCII art, but here goes anyway: can we label >> the arrows with job verbs? >> > > Can you recommend a tool to help me do that? I've been using asciiflow > infinity (http://asciiflow.com) and it's not very good, but I don't have > anything better. I do my ASCII art in Emacs picture-mode. >> Can you briefly explain how this state machine addresses (1) and (2)? >> > > (1) The CONCLUDED state allows jobs to persist in the job query list > after they would have disappeared in 2.11-era QEMU. This lets us query > for completion codes and to dismiss the job at our own leisure. Got it. > (2) The PENDING state allows jobs to wait in a nearly-completed state, > pending authorization from the QMP client to make graph changes. > Otherwise, the job has to asynchronously perform this cleanup and the > exact point in time is unknowable to the QMP client. By making a PENDING > state and a finalize callback (.prepare), we can make this portion of a > job's task synchronous. This provides for jobs modifying the graph on job completion. It doesn't provide for jobs modifying the graph while they run. Fine with me; we're not aware of a use for messing with the graph in the middle of a job. > "John, you added more than two states..." > > Yup, this was to help simplify the existing state machine, believe it or > not. I modeled all jobs as transactions to eliminate different cleanup > routing and added two new interim states; > > - WAITING > - ABORTING > > to help make assertions about the valid transitions jobs can make. The > ABORTING state helps make it clear when a job is allowed to fail (and > emit QMP events related to such). > > The WAITING state is simply advisory to help a client know that a job is > "finished" but cannot yet receive further instruction because of peers > in a transaction. This helps me to add nice QMP errors for any verbs > issued to such jobs. "Sorry pal, this job is waiting and can't hear you > right now!" > > This kept the code cleaner than adding a bunch of very fragile boolean > error-checking pathways in dozens of helper functions to help avoid > illegal instructions on jobs not prepared to receive those instructions. > > So these two new states don't help accomplish (1) or (2) strictly, but > they do facilitate the code additions that _do_ a lot less ugly. Thanks! Looks like a fine starting point for in-tree state machine documentation :)