> On Thursday, February 6, 2014 12:56:21 PM UTC-8, Matthew York wrote:
> >
> > We've been using the latest version of ruote w/ ruote-mon using 2 workers
> > for about 6 months now.
> >
> > Over time as our process definitions have become more complex and longer
> > running they have also become less reliable.

Hello Matthew,

Combined with the "we just tried using a single worker which seems to be way
more stable", I'd say there's something tiny something wrong in one
expression that fails sometimes and the sometimes do accumulate.

Or simply a problem with ruote-mon.

> > Many processes are getting 'stuck' - where they never enter the error
> > state and also fail to respond to cancel.
> >
> > I’ve been Using Ruote-kit to monitor and clean up these processes which
> > usually works.
> >
> > In the case where a process is 'stuck' and I attempt to kill it, the
> > process changes to the 'dying' state, and never gets removed from the list.

It'd be interesting to know how the dying state propagates in the stuck
process expression trees.

> > This seems to happen around calls to subprocesses where I attempt to use
> > the ‘pass’ expression for on_error and on_timeout:
>
> >  cursor :timeout => '${v:timeout}', :on_timeout => :pass, :tag =>
> > 'wait_for_fqdn_discovery' do
> >
> >    get_machine_fqdn
> >
> >    sequence :unless => '${f:machine_fqdn}' do
> >
> >      log 'waiting 60s' => '${f:machine.machine_id}'
> >
> >      wait '60s'
> >
> >      rewind
> >
> >    end
> >
> >  end
> >
> > refresh_state :on_error => 'pass'
>
> > Am I doing this incorrectly?

It looks OK. Maybe http://ruote.io/common_attributes.html#on_error_composing
could help (or bring more "stuckage").

On Thu, Feb 06, 2014 at 01:35:13PM -0800, Matthew York wrote:
> After some more testing - we tried just using a single worker which seems
> to be way more stable.

I could let you go with running one worker and hope for the best. Or I could
press for more information and locate and fix the issue. Or help you locate
and fix the issue, in ruote and/or in ruote-mon.

Kind regards,

John

-- 
-- 
you received this message because you are subscribed to the "ruote users" group.
to post : send email to [email protected]
to unsubscribe : send email to [email protected]
more options : http://groups.google.com/group/openwferu-users?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"ruote" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to