Hello John - Thanks for the response

I liked the idea of having 2 workers using the same storage, but the main 
goal was to provide some redundancy.
The processes queuing up work for ruote are behind activeMQ, so I suppose 
there is still redundancy even if I end up splitting the 2 workers to use 
separate storage.

Rather than troubleshoot ruote-mon I was thinking about trying the redis 
provider to see if it exhibits the same behavior.
For now I'll move ahead with a single worker & attempt to find whatever 
other issues I may be having with the process definitions themselves.

Later on if I want to continue to troubleshoot is 
using http://ruote.rubyforge.org/noisy.html the only way to go? I recall 
having issues trying to get this to work.

Thanks,
--Matt

On Thursday, February 6, 2014 1:59:30 PM UTC-8, John Mettraux wrote:
>
>
> > On Thursday, February 6, 2014 12:56:21 PM UTC-8, Matthew York wrote: 
> > > 
> > > We've been using the latest version of ruote w/ ruote-mon using 2 
> workers 
> > > for about 6 months now. 
> > > 
> > > Over time as our process definitions have become more complex and 
> longer 
> > > running they have also become less reliable. 
>
> Hello Matthew, 
>
> Combined with the "we just tried using a single worker which seems to be 
> way 
> more stable", I'd say there's something tiny something wrong in one 
> expression that fails sometimes and the sometimes do accumulate. 
>
> Or simply a problem with ruote-mon. 
>
> > > Many processes are getting 'stuck' - where they never enter the error 
> > > state and also fail to respond to cancel. 
> > > 
> > > I’ve been Using Ruote-kit to monitor and clean up these processes 
> which 
> > > usually works. 
> > > 
> > > In the case where a process is 'stuck' and I attempt to kill it, the 
> > > process changes to the 'dying' state, and never gets removed from the 
> list. 
>
> It'd be interesting to know how the dying state propagates in the stuck 
> process expression trees. 
>
> > > This seems to happen around calls to subprocesses where I attempt to 
> use 
> > > the ‘pass’ expression for on_error and on_timeout: 
> > 
> > >  cursor :timeout => '${v:timeout}', :on_timeout => :pass, :tag => 
> > > 'wait_for_fqdn_discovery' do 
> > > 
> > >    get_machine_fqdn 
> > > 
> > >    sequence :unless => '${f:machine_fqdn}' do 
> > > 
> > >      log 'waiting 60s' => '${f:machine.machine_id}' 
> > > 
> > >      wait '60s' 
> > > 
> > >      rewind 
> > > 
> > >    end 
> > > 
> > >  end 
> > > 
> > > refresh_state :on_error => 'pass' 
> > 
> > > Am I doing this incorrectly? 
>
> It looks OK. Maybe 
> http://ruote.io/common_attributes.html#on_error_composing 
> could help (or bring more "stuckage"). 
>
> On Thu, Feb 06, 2014 at 01:35:13PM -0800, Matthew York wrote: 
> > After some more testing - we tried just using a single worker which 
> seems 
> > to be way more stable. 
>
> I could let you go with running one worker and hope for the best. Or I 
> could 
> press for more information and locate and fix the issue. Or help you 
> locate 
> and fix the issue, in ruote and/or in ruote-mon. 
>
> Kind regards, 
>
> John 
>
>

-- 
-- 
you received this message because you are subscribed to the "ruote users" group.
to post : send email to [email protected]
to unsubscribe : send email to [email protected]
more options : http://groups.google.com/group/openwferu-users?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"ruote" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to