Re: [openstack-dev] [Mistral] Proposal for the Resume Feature

2015-06-16 Thread Lingxian Kong
Winson, thanks for the etherpad to gather different opinions.

Dmitri, I think it's ok we discuss here, make more people get
involved, we could use etherpad for summary.

On Wed, Jun 17, 2015 at 2:22 AM, W Chan  wrote:
> Here's the etherpad link.  I replied to the comments/feedbacks there.
> Please feel free to continue the conversation there.
> https://etherpad.openstack.org/p/mistral-resume
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



-- 
Regards!
---
Lingxian Kong

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Mistral] Proposal for the Resume Feature

2015-06-16 Thread W Chan
Here's the etherpad link.  I replied to the comments/feedbacks there.
Please feel free to continue the conversation there.
https://etherpad.openstack.org/p/mistral-resume
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Mistral] Proposal for the Resume Feature

2015-06-16 Thread Dmitri Zimine
+1 great write-up Winson,

I propose we move the discussion to an etherpad, and flash out details there so 
it won’t get lost in a long thread. 
Winson would you care to create one and post here? 

Re: ‘error state’: I think it’s not absolutely necessary: pause/resume can be 
done without enabling ‘error->running’ transition, 
by making default task policy `on-error: pause`so that if user chooses, the 
workflow goes into paused state on errors.
But it may be convenient, so no strong opinion on this yet. 


Re: checkpoint and roll-backs - yes! I see this and pause-resume complimentary. 
To be precise on terminology, workflows don't “roll-back” - this is more 
transactional term, they “compensate”, by running a ‘compensation workflow’ 
that gets system to back to a checkpoint state. 
At the end of compensational process the system goes in “paused” state where it 
can be resumed once the ‘cause of failure’ is fixed. 

DZ. 

On Jun 15, 2015, at 10:25 PM, BORTMAN, Limor (Limor) 
 wrote:

> +1,
> I just have one question. Do we want to able resume for WF  in error state?
> I mean isn't real "resume" it should be more of a rerun, don't you think?
> So in an error state we will create new executor and just re run it
> Thanks Limor
> 
> 
> 
> -Original Message-
> From: Lingxian Kong [mailto:anlin.k...@gmail.com] 
> Sent: Tuesday, June 16, 2015 5:47 AM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [Mistral] Proposal for the Resume Feature
> 
> Thanks Winson for the write-up, very detailed infomation. (the format was 
> good)
> 
> I'm totally in favor of your idea, actually, I really think you proposal is 
> complementary to my proposal in 
> https://etherpad.openstack.org/p/vancouver-2015-design-summit-mistral,
> please see 'Workflow rollback/recovery' section.
> 
> What I wanna do is configure some 'checkpoints' throughout the workflow, and 
> if some task failed, we could rollback the execution to some checkpoint, and 
> resume the whole workflow after we have fixed some problem, seems like the 
> execution has never been failed before.
> 
> It's just a initial idea, I'm waiting for our discussion to see if it really 
> makes sense to users, to get feedback, then we can talk about the 
> implementation and cooperation.
> 
> On Tue, Jun 16, 2015 at 7:51 AM, W Chan  wrote:
>> Resending to see if this fixes the formatting for outlines below.
>> 
>> 
>> I want to continue the discussion on the workflow "resume" feature.
>> 
>> 
>> Resuming from our last conversation @
>> http://lists.openstack.org/pipermail/openstack-dev/2015-March/060265.h
>> tml. I don't think we should limit how users resume. There may be 
>> different possible scenarios. User can fix the environment or 
>> condition that led to the failure of the current task and the user 
>> wants to just re-run the failed task.  Or user can actually fix the 
>> environment/condition which include fixing what the task was doing, 
>> then just want to continue the next set of task(s).
>> 
>> 
>> The following is a list of proposed changes.
>> 
>> 
>> 1. A new CLI operation to resume WF (i.e. mistral workflow-resume).
>> 
>>A. If no additional info is provided, assume this WF is manually 
>> paused and there are no task/action execution errors. The WF state is 
>> updated to RUNNING. Update using the put method @ 
>> ExecutionsController. The put method checks that there's no task/action 
>> execution errors.
>> 
>>B. If WF is in an error state
>> 
>>i. To resume from failed task, the workflow-resume command 
>> requires the WF execution ID, task name, and/or task input.
>> 
>>ii. To resume from failed with-items task
>> 
>>a. Re-run the entire task (re-run all items) requires WF
>> execution ID, task name and/or task input.
>> 
>>b. Re-run a single item requires WF execution ID, task 
>> name, with-items index, and/or task input for the item.
>> 
>>c. Re-run selected items requires WF execution ID, task 
>> name, with-items indices, and/or task input for each items.
>> 
>>- To resume from the next task(s), the workflow-resume 
>> command requires the WF execution ID, failed task name, output for the 
>> failed task, and a flag to skip the failed task.
>> 
>> 
>> 2. Make ERROR -> RUNNING as valid state transition @ 
>> is_valid_transition function.
>> 
>> 
>> 3. Add a comments field to Execution model. Add a note that indic

Re: [openstack-dev] [Mistral] Proposal for the Resume Feature

2015-06-15 Thread BORTMAN, Limor (Limor)
+1,
I just have one question. Do we want to able resume for WF  in error state?
I mean isn't real "resume" it should be more of a rerun, don't you think?
So in an error state we will create new executor and just re run it
Thanks Limor



-Original Message-
From: Lingxian Kong [mailto:anlin.k...@gmail.com] 
Sent: Tuesday, June 16, 2015 5:47 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Mistral] Proposal for the Resume Feature

Thanks Winson for the write-up, very detailed infomation. (the format was good)

I'm totally in favor of your idea, actually, I really think you proposal is 
complementary to my proposal in 
https://etherpad.openstack.org/p/vancouver-2015-design-summit-mistral,
please see 'Workflow rollback/recovery' section.

What I wanna do is configure some 'checkpoints' throughout the workflow, and if 
some task failed, we could rollback the execution to some checkpoint, and 
resume the whole workflow after we have fixed some problem, seems like the 
execution has never been failed before.

It's just a initial idea, I'm waiting for our discussion to see if it really 
makes sense to users, to get feedback, then we can talk about the 
implementation and cooperation.

On Tue, Jun 16, 2015 at 7:51 AM, W Chan  wrote:
> Resending to see if this fixes the formatting for outlines below.
>
>
> I want to continue the discussion on the workflow "resume" feature.
>
>
> Resuming from our last conversation @
> http://lists.openstack.org/pipermail/openstack-dev/2015-March/060265.h
> tml. I don't think we should limit how users resume. There may be 
> different possible scenarios. User can fix the environment or 
> condition that led to the failure of the current task and the user 
> wants to just re-run the failed task.  Or user can actually fix the 
> environment/condition which include fixing what the task was doing, 
> then just want to continue the next set of task(s).
>
>
> The following is a list of proposed changes.
>
>
> 1. A new CLI operation to resume WF (i.e. mistral workflow-resume).
>
> A. If no additional info is provided, assume this WF is manually 
> paused and there are no task/action execution errors. The WF state is 
> updated to RUNNING. Update using the put method @ 
> ExecutionsController. The put method checks that there's no task/action 
> execution errors.
>
> B. If WF is in an error state
>
> i. To resume from failed task, the workflow-resume command 
> requires the WF execution ID, task name, and/or task input.
>
> ii. To resume from failed with-items task
>
> a. Re-run the entire task (re-run all items) requires WF
> execution ID, task name and/or task input.
>
> b. Re-run a single item requires WF execution ID, task 
> name, with-items index, and/or task input for the item.
>
> c. Re-run selected items requires WF execution ID, task 
> name, with-items indices, and/or task input for each items.
>
> - To resume from the next task(s), the workflow-resume 
> command requires the WF execution ID, failed task name, output for the 
> failed task, and a flag to skip the failed task.
>
>
> 2. Make ERROR -> RUNNING as valid state transition @ 
> is_valid_transition function.
>
>
> 3. Add a comments field to Execution model. Add a note that indicates 
> the execution is launched by workflow-resume. Auto-populated in this case.
>
>
> 4. Resume from failed task.
>
> A. Re-run task with the same task inputs >> POST new action 
> execution for the task execution @ ActionExecutionsController
>
> B. Re-run task with different task inputs >> POST new action 
> execution for the task execution, allowed for different input @ 
> ActionExecutionsController
>
>
> 5. Resume from next task(s).
>
> A. Inject a noop task execution or noop action execution 
> (undecided yet) for the failed task with appropriate output.  The spec 
> is an adhoc spec that copies conditions from the failed task. This 
> provides some audit functionality and should trigger the next set of 
> task executions (in case of branching and such).
>
>
>
> __
>  OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: 
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



--
Regards!
---
Lingxian Kong

__
OpenStack Development Mailing List (not for usage questio

Re: [openstack-dev] [Mistral] Proposal for the Resume Feature

2015-06-15 Thread Lingxian Kong
Thanks Winson for the write-up, very detailed infomation. (the format was good)

I'm totally in favor of your idea, actually, I really think you
proposal is complementary to my proposal in
https://etherpad.openstack.org/p/vancouver-2015-design-summit-mistral,
please see 'Workflow rollback/recovery' section.

What I wanna do is configure some 'checkpoints' throughout the
workflow, and if some task failed, we could rollback the execution to
some checkpoint, and resume the whole workflow after we have fixed
some problem, seems like the execution has never been failed before.

It's just a initial idea, I'm waiting for our discussion to see if it
really makes sense to users, to get feedback, then we can talk about
the implementation and cooperation.

On Tue, Jun 16, 2015 at 7:51 AM, W Chan  wrote:
> Resending to see if this fixes the formatting for outlines below.
>
>
> I want to continue the discussion on the workflow "resume" feature.
>
>
> Resuming from our last conversation @
> http://lists.openstack.org/pipermail/openstack-dev/2015-March/060265.html. I
> don't think we should limit how users resume. There may be different
> possible scenarios. User can fix the environment or condition that led to
> the failure of the current task and the user wants to just re-run the failed
> task.  Or user can actually fix the environment/condition which include
> fixing what the task was doing, then just want to continue the next set of
> task(s).
>
>
> The following is a list of proposed changes.
>
>
> 1. A new CLI operation to resume WF (i.e. mistral workflow-resume).
>
> A. If no additional info is provided, assume this WF is manually paused
> and there are no task/action execution errors. The WF state is updated to
> RUNNING. Update using the put method @ ExecutionsController. The put method
> checks that there's no task/action execution errors.
>
> B. If WF is in an error state
>
> i. To resume from failed task, the workflow-resume command requires
> the WF execution ID, task name, and/or task input.
>
> ii. To resume from failed with-items task
>
> a. Re-run the entire task (re-run all items) requires WF
> execution ID, task name and/or task input.
>
> b. Re-run a single item requires WF execution ID, task name,
> with-items index, and/or task input for the item.
>
> c. Re-run selected items requires WF execution ID, task name,
> with-items indices, and/or task input for each items.
>
> - To resume from the next task(s), the workflow-resume
> command requires the WF execution ID, failed task name, output for the
> failed task, and a flag to skip the failed task.
>
>
> 2. Make ERROR -> RUNNING as valid state transition @ is_valid_transition
> function.
>
>
> 3. Add a comments field to Execution model. Add a note that indicates the
> execution is launched by workflow-resume. Auto-populated in this case.
>
>
> 4. Resume from failed task.
>
> A. Re-run task with the same task inputs >> POST new action execution
> for the task execution @ ActionExecutionsController
>
> B. Re-run task with different task inputs >> POST new action execution
> for the task execution, allowed for different input @
> ActionExecutionsController
>
>
> 5. Resume from next task(s).
>
> A. Inject a noop task execution or noop action execution (undecided yet)
> for the failed task with appropriate output.  The spec is an adhoc spec that
> copies conditions from the failed task. This provides some audit
> functionality and should trigger the next set of task executions (in case of
> branching and such).
>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



-- 
Regards!
---
Lingxian Kong

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Mistral] Proposal for the Resume Feature

2015-06-15 Thread W Chan
Resending to see if this fixes the formatting for outlines below.


I want to continue the discussion on the workflow "resume" feature.


Resuming from our last conversation @
http://lists.openstack.org/pipermail/openstack-dev/2015-March/060265.html.
I don't think we should limit how users resume. There may be different
possible scenarios. User can fix the environment or condition that led to
the failure of the current task and the user wants to just re-run the
failed task.  Or user can actually fix the environment/condition which
include fixing what the task was doing, then just want to continue the next
set of task(s).


The following is a list of proposed changes.


1. A new CLI operation to resume WF (i.e. mistral workflow-resume).

A. If no additional info is provided, assume this WF is manually paused
and there are no task/action execution errors. The WF state is updated to
RUNNING. Update using the put method @ ExecutionsController. The put method
checks that there's no task/action execution errors.

B. If WF is in an error state

i. To resume from failed task, the workflow-resume command requires
the WF execution ID, task name, and/or task input.

ii. To resume from failed with-items task

a. Re-run the entire task (re-run all items) requires WF
execution ID, task name and/or task input.

b. Re-run a single item requires WF execution ID, task name,
with-items index, and/or task input for the item.

c. Re-run selected items requires WF execution ID, task name,
with-items indices, and/or task input for each items.

- To resume from the next task(s), the workflow-resume
command requires the WF execution ID, failed task name, output for the
failed task, and a flag to skip the failed task.


2. Make ERROR -> RUNNING as valid state transition @ is_valid_transition
function.


3. Add a comments field to Execution model. Add a note that indicates the
execution is launched by workflow-resume. Auto-populated in this case.


4. Resume from failed task.

A. Re-run task with the same task inputs >> POST new action execution
for the task execution @ ActionExecutionsController

B. Re-run task with different task inputs >> POST new action execution
for the task execution, allowed for different input @
ActionExecutionsController


5. Resume from next task(s).

A. Inject a noop task execution or noop action execution (undecided
yet) for the failed task with appropriate output.  The spec is an adhoc
spec that copies conditions from the failed task. This provides some audit
functionality and should trigger the next set of task executions (in case
of branching and such).
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Mistral] Proposal for the Resume Feature

2015-06-15 Thread W Chan
I want to continue the discussion on the workflow "resume" feature.


Resuming from our last conversation @
http://lists.openstack.org/pipermail/openstack-dev/2015-March/060265.html.
I don't think we should limit how users resume. There may be different
possible scenarios. User can fix the environment or condition that led to
the failure of the current task and the user wants to just re-run the
failed task.  Or user can actually fix the environment/condition which
include fixing what the task was doing, then just want to continue the next
set of task(s).


The following is a list of proposed changes.



   1. A new CLI operation to resume WF (i.e. mistral workflow-resume).
  1. If no additional info is provided, assume this WF is manually
  paused and there are no task/action execution errors. The WF state is
  updated to RUNNING. Update using the put method @
ExecutionsController. The
  put method checks that there's no task/action execution errors.
  2. If WF is in an error state
 1. To resume from failed task, the workflow-resume command
 requires the WF execution ID, task name, and/or task input.
 2. To resume from failed with-items task
1. Re-run the entire task (re-run all items) requires WF
execution ID, task name and/or task input.
2. Re-run a single item requires WF execution ID, task name,
with-items index, and/or task input for the item.
3. Re-run selected items requires WF execution ID, task name,
with-items indices, and/or task input for each items.
 3. To resume from the next task(s), the workflow-resume command
 requires the WF execution ID, failed task name, output for
the failed task,
 and a flag to skip the failed task.



   1. Make ERROR -> RUNNING as valid state transition @ is_valid_transition
   function.



   1. Add a comments field to Execution model. Add a note that indicates
   the execution is launched by workflow-resume. Auto-populated in this case.



   1. Resume from failed task.
  1. Re-run task with the same task inputs >> POST new action execution
  for the task execution @ ActionExecutionsController
  2. Re-run task with different task inputs >> POST new action
  execution for the task execution, allowed for different input @
  ActionExecutionsController



   1. Resume from next task(s).
  1. Inject a noop task execution or noop action execution (undecided
  yet) for the failed task with appropriate output.  The spec is an adhoc
  spec that copies conditions from the failed task. This provides
some audit
  functionality and should trigger the next set of task executions (in case
  of branching and such).
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Mistral] Proposal for the Resume Feature

2015-03-31 Thread Renat Akhmerov
Hi,

Thanks guys for bringing this topic up to discussion. In my opinion, this 
feature is extremely important and will move Mistral further to being a truly 
useful tool. I think it’s one of the “must have” feature of Mistral.


> On 31 Mar 2015, at 08:56, Dmitri Zimine  wrote:
> 
> @Lingxian Kong
>>  The context for a task is used
>> internally, I know the aim for this feature is to make it very easy
>> and convinient for users to see the details for the workflow exection,
>> but what users can do next with the context? Do you have a plan to
>> change that context for a task by users? if the answer is no, I think
>> it is not very necessary to expose the context endpoint.
> 
> I think the answer is “yes users will change the context” this falls out of 
> use case #3. 
> Let’s be specific: a create_vm task failed due to, say, network connection. 
> As a user, I created the VM manually, now want to continue the workflow. 
> Next step is attach storage to VM, needs VM ID published variable. So a user 
> needs to 
> modify outgoing context of create_vm task.

Agree with Dmitri here.


> May be use case 2 be sufficient? 
> We are also likely to specify multiple tasks: in case a parallel execution of 
> two tasks
> (create VM, create DNS record) failed again due to network conditions - than 
> network 
> is back I want to continue, but re-run those two exact tasks. 
> 
> Another point, may be obvious but let’s articulate it: we re-run task, not 
> individual action within task.
> In context of with_items, retry, repeat, it will lead to running actions 
> multiple times.
> 
> Finally, workflow execution traceability. We need to get to the point of 
> tracing pause and resume as workflow events. 
> 
> @Lingxian Kong
>>  we can introduce the notification
>> system to Mistral, which is heavily used in other OpenStack projects.
> care to elaborare? Thanks! 

I’m curious too. Lingxian, could you please explain more detailed what you mean 
exactly?

>> On Fri, Mar 27, 2015 at 11:20 AM, W Chan > > wrote:
>>> We assume WF is in paused/errored state when 1) user manually pause the WF,
>>> 2) pause is specified on transition (on-condition(s) such as on-error), and
>>> 3) task errored.
>>> 
>>> The resume feature will support the following use cases.
>>> 1) User resumes WF from manual pause.
>>> 2) In the case of task failure, user fixed the problem manually outside of
>>> Mistral, and user wants to re-run the failed task.
>>> 3) In the case of task failure, user fixed the problem manually outside of
>>> Mistral, and user wants to resume from the next task.
>>> 
>>> Resuming from #1 should be straightforward.

Just to clarify: this already works.

>>> Resuming from #2, user may want to change the inbound context.
>>> Resuming from #3, users is required to manually provide the published vars
>>> for the failed task(s).


These two cases is basically what we need to implement.

Winson, very good and clear summary (at least to me). I would suggest we 
prepare a little bit more formal (but not too much) spec of what we’re going to 
do here. A few examples would help us understand the topic better. So 
specifically, it would be interesting to see:

What endpoints we are going to add and how approximately they would look 
(calculating requirements that need to be satisfied in order to resume 
workflow, task contextx).
A few typical scenarios of resuming a workflow with explanations of how we 
modify contexts or published vars and how we resume the workflow. The trivial 
case (#1) can be skipped as it’s already implemented.
Roughly formed suggestions on how that all could be implemented.

This is just my preference to see something like this but at the same time I 
personally don’t want you to spend much time on that but if it’s possible to 
prepare it within a reasonable amount of time that would be helpful

Thanks

Renat Akhmerov
@ Mirantis Inc.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Mistral] Proposal for the Resume Feature

2015-03-30 Thread Dmitri Zimine
Thanks Winson for the summary. 

@Lingxian Kong
>  The context for a task is used
> internally, I know the aim for this feature is to make it very easy
> and convinient for users to see the details for the workflow exection,
> but what users can do next with the context? Do you have a plan to
> change that context for a task by users? if the answer is no, I think
> it is not very necessary to expose the context endpoint.

I think the answer is “yes users will change the context” this falls out of use 
case #3. 
Let’s be specific: a create_vm task failed due to, say, network connection. 
As a user, I created the VM manually, now want to continue the workflow. 
Next step is attach storage to VM, needs VM ID published variable. So a user 
needs to 
modify outgoing context of create_vm task.

May be use case 2 be sufficient? 
We are also likely to specify multiple tasks: in case a parallel execution of 
two tasks
(create VM, create DNS record) failed again due to network conditions - than 
network 
is back I want to continue, but re-run those two exact tasks. 

Another point, may be obvious but let’s articulate it: we re-run task, not 
individual action within task.
In context of with_items, retry, repeat, it will lead to running actions 
multiple times.

Finally, workflow execution traceability. We need to get to the point of 
tracing pause and resume as workflow events. 

@Lingxian Kong
>  we can introduce the notification
> system to Mistral, which is heavily used in other OpenStack projects.
care to elaborare? Thanks! 

DZ>  


On Mar 26, 2015, at 10:29 PM, Lingxian Kong  wrote:

> On Fri, Mar 27, 2015 at 11:20 AM, W Chan  wrote:
>> We assume WF is in paused/errored state when 1) user manually pause the WF,
>> 2) pause is specified on transition (on-condition(s) such as on-error), and
>> 3) task errored.
>> 
>> The resume feature will support the following use cases.
>> 1) User resumes WF from manual pause.
>> 2) In the case of task failure, user fixed the problem manually outside of
>> Mistral, and user wants to re-run the failed task.
>> 3) In the case of task failure, user fixed the problem manually outside of
>> Mistral, and user wants to resume from the next task.
> this use case does really make sense to me.
>> 
>> Resuming from #1 should be straightforward.
>> Resuming from #2, user may want to change the inbound context.
>> Resuming from #3, users is required to manually provide the published vars
>> for the failed task(s).
>> 
>> In our offline discussion, there's ambiguity with on-error clause and
>> whether a task failure has already been addressed by the WF itself.  In many
>> cases, the on-error tasks may just be logging, email notification, and/or
>> other non-recovery procedures.  It's hard to determine that automatically,
>> so we let users decide where to resume the WF instead.  Mistral will let
>> user resume a WF from specific point. The resume function will determine the
>> requirements needed to successfully resume.  If requirements are not met,
>> then resume returns an error saying what requirements are missing.  In the
>> case where there are failures in multiple parallel branches, the
>> requirements may include more than one tasks.  For cases where user
>> accidentally resume from an earlier task that is already successfully
>> completed, the resume function should detect that and throw an exception.
>> 
>> Also, the current change to separate task from action execution should be
>> sufficient for traceability.
>> 
>> We also want to expose an endpoint to let users view context for a task.
>> This is to let user have a reference of the current task context to
>> determine the delta they need to change for a successful resume.
> IMHO, I'm afraid I can't agree here. The context for a task is used
> internally, I know the aim for this feature is to make it very easy
> and convinient for users to see the details for the workflow exection,
> but what users can do next with the context? Do you have a plan to
> change that context for a task by users? if the answer is no, I think
> it is not very necessary to expose the context endpoint.
> 
> However, considering the importance of context for the task
> execution(the resuming feature), we can introduce the notification
> system to Mistral, which is heavily used in other OpenStack projects.
>> 
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> 
> 
> 
> 
> -- 
> Regards!
> ---
> Lingxian Kong
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [openstack-dev] [Mistral] Proposal for the Resume Feature

2015-03-26 Thread Lingxian Kong
On Fri, Mar 27, 2015 at 11:20 AM, W Chan  wrote:
> We assume WF is in paused/errored state when 1) user manually pause the WF,
> 2) pause is specified on transition (on-condition(s) such as on-error), and
> 3) task errored.
>
> The resume feature will support the following use cases.
> 1) User resumes WF from manual pause.
> 2) In the case of task failure, user fixed the problem manually outside of
> Mistral, and user wants to re-run the failed task.
> 3) In the case of task failure, user fixed the problem manually outside of
> Mistral, and user wants to resume from the next task.
this use case does really make sense to me.
>
> Resuming from #1 should be straightforward.
> Resuming from #2, user may want to change the inbound context.
> Resuming from #3, users is required to manually provide the published vars
> for the failed task(s).
>
> In our offline discussion, there's ambiguity with on-error clause and
> whether a task failure has already been addressed by the WF itself.  In many
> cases, the on-error tasks may just be logging, email notification, and/or
> other non-recovery procedures.  It's hard to determine that automatically,
> so we let users decide where to resume the WF instead.  Mistral will let
> user resume a WF from specific point. The resume function will determine the
> requirements needed to successfully resume.  If requirements are not met,
> then resume returns an error saying what requirements are missing.  In the
> case where there are failures in multiple parallel branches, the
> requirements may include more than one tasks.  For cases where user
> accidentally resume from an earlier task that is already successfully
> completed, the resume function should detect that and throw an exception.
>
> Also, the current change to separate task from action execution should be
> sufficient for traceability.
>
> We also want to expose an endpoint to let users view context for a task.
> This is to let user have a reference of the current task context to
> determine the delta they need to change for a successful resume.
IMHO, I'm afraid I can't agree here. The context for a task is used
internally, I know the aim for this feature is to make it very easy
and convinient for users to see the details for the workflow exection,
but what users can do next with the context? Do you have a plan to
change that context for a task by users? if the answer is no, I think
it is not very necessary to expose the context endpoint.

However, considering the importance of context for the task
execution(the resuming feature), we can introduce the notification
system to Mistral, which is heavily used in other OpenStack projects.
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



-- 
Regards!
---
Lingxian Kong

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Mistral] Proposal for the Resume Feature

2015-03-26 Thread W Chan
We assume WF is in paused/errored state when 1) user manually pause the WF,
2) pause is specified on transition (on-condition(s) such as on-error), and
3) task errored.

The resume feature will support the following use cases.
1) User resumes WF from manual pause.
2) In the case of task failure, user fixed the problem manually outside of
Mistral, and user wants to re-run the failed task.
3) In the case of task failure, user fixed the problem manually outside of
Mistral, and user wants to resume from the next task.

Resuming from #1 should be straightforward.
Resuming from #2, user may want to change the inbound context.
Resuming from #3, users is required to manually provide the published vars
for the failed task(s).

In our offline discussion, there's ambiguity with on-error clause and
whether a task failure has already been addressed by the WF itself.  In
many cases, the on-error tasks may just be logging, email notification,
and/or other non-recovery procedures.  It's hard to determine that
automatically, so we let users decide where to resume the WF instead.
Mistral will let user resume a WF from specific point. The resume function
will determine the requirements needed to successfully resume.  If
requirements are not met, then resume returns an error saying what
requirements are missing.  In the case where there are failures in multiple
parallel branches, the requirements may include more than one tasks.  For
cases where user accidentally resume from an earlier task that is already
successfully completed, the resume function should detect that and throw an
exception.

Also, the current change to separate task from action execution should be
sufficient for traceability.

We also want to expose an endpoint to let users view context for a task.
This is to let user have a reference of the current task context to
determine the delta they need to change for a successful resume.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev