Hello John - Thanks for the response I liked the idea of having 2 workers using the same storage, but the main goal was to provide some redundancy. The processes queuing up work for ruote are behind activeMQ, so I suppose there is still redundancy even if I end up splitting the 2 workers to use separate storage.
Rather than troubleshoot ruote-mon I was thinking about trying the redis provider to see if it exhibits the same behavior. For now I'll move ahead with a single worker & attempt to find whatever other issues I may be having with the process definitions themselves. Later on if I want to continue to troubleshoot is using http://ruote.rubyforge.org/noisy.html the only way to go? I recall having issues trying to get this to work. Thanks, --Matt On Thursday, February 6, 2014 1:59:30 PM UTC-8, John Mettraux wrote: > > > > On Thursday, February 6, 2014 12:56:21 PM UTC-8, Matthew York wrote: > > > > > > We've been using the latest version of ruote w/ ruote-mon using 2 > workers > > > for about 6 months now. > > > > > > Over time as our process definitions have become more complex and > longer > > > running they have also become less reliable. > > Hello Matthew, > > Combined with the "we just tried using a single worker which seems to be > way > more stable", I'd say there's something tiny something wrong in one > expression that fails sometimes and the sometimes do accumulate. > > Or simply a problem with ruote-mon. > > > > Many processes are getting 'stuck' - where they never enter the error > > > state and also fail to respond to cancel. > > > > > > I’ve been Using Ruote-kit to monitor and clean up these processes > which > > > usually works. > > > > > > In the case where a process is 'stuck' and I attempt to kill it, the > > > process changes to the 'dying' state, and never gets removed from the > list. > > It'd be interesting to know how the dying state propagates in the stuck > process expression trees. > > > > This seems to happen around calls to subprocesses where I attempt to > use > > > the ‘pass’ expression for on_error and on_timeout: > > > > > cursor :timeout => '${v:timeout}', :on_timeout => :pass, :tag => > > > 'wait_for_fqdn_discovery' do > > > > > > get_machine_fqdn > > > > > > sequence :unless => '${f:machine_fqdn}' do > > > > > > log 'waiting 60s' => '${f:machine.machine_id}' > > > > > > wait '60s' > > > > > > rewind > > > > > > end > > > > > > end > > > > > > refresh_state :on_error => 'pass' > > > > > Am I doing this incorrectly? > > It looks OK. Maybe > http://ruote.io/common_attributes.html#on_error_composing > could help (or bring more "stuckage"). > > On Thu, Feb 06, 2014 at 01:35:13PM -0800, Matthew York wrote: > > After some more testing - we tried just using a single worker which > seems > > to be way more stable. > > I could let you go with running one worker and hope for the best. Or I > could > press for more information and locate and fix the issue. Or help you > locate > and fix the issue, in ruote and/or in ruote-mon. > > Kind regards, > > John > > -- -- you received this message because you are subscribed to the "ruote users" group. to post : send email to [email protected] to unsubscribe : send email to [email protected] more options : http://groups.google.com/group/openwferu-users?hl=en --- You received this message because you are subscribed to the Google Groups "ruote" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
