After some more testing - we tried just using a single worker which seems
to be way more stable.
On Thursday, February 6, 2014 12:56:21 PM UTC-8, Matthew York wrote:
>
> Hello,
>
> We've been using the latest version of ruote w/ ruote-mon using 2 workers
> for about 6 months now.
>
> Over time as our process definitions have become more complex and longer
> running they have also become less reliable.
>
> Many processes are getting 'stuck' - where they never enter the error
> state and also fail to respond to cancel.
>
> I’ve been Using Ruote-kit to monitor and clean up these processes which
> usually works.
>
> In the case where a process is 'stuck' and I attempt to kill it, the
> process changes to the 'dying' state, and never gets removed from the list.
>
> This seems to happen around calls to subprocesses where I attempt to use
> the ‘pass’ expression for on_error and on_timeout:
>
> cursor :timeout => '${v:timeout}', :on_timeout => :pass, :tag =>
> 'wait_for_fqdn_discovery' do
>
> get_machine_fqdn
>
> sequence :unless => '${f:machine_fqdn}' do
>
> log 'waiting 60s' => '${f:machine.machine_id}'
>
> wait '60s'
>
> rewind
>
> end
>
> end
>
>
>
> refresh_state :on_error => 'pass'
>
> Am I doing this incorrectly?
>
--
--
you received this message because you are subscribed to the "ruote users" group.
to post : send email to [email protected]
to unsubscribe : send email to [email protected]
more options : http://groups.google.com/group/openwferu-users?hl=en
---
You received this message because you are subscribed to the Google Groups
"ruote" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.