[ruote:3103] Re: Ruote Participant Error trapping/Monitoring

Eric Wed, 25 May 2011 09:01:58 -0700

Thanks for the quick reply.

> Upon final reading of your email and my reply, I think that wrapping your 
> mailer call with a timeout (and some reaction block) is a must. You know it 
> can go wrong >and you know how to deal with it.


The timeout seems like a straight forward solution to the specific
problem, and I take the point about different types of participants.

> Mirroring indeed, what storage are you using ? You could using 
> Ruote::StorageParticipant and only store via ActiveRecord. Or you could let 
> ruote-sequel or ruote-dm >do the storage, while doing further manipulation 
> via ActiveRecord (it can read any table when handled in a persuasive way).

We are currently using ruote-sequel for our storage, and have tried
letting rails read the table via active record, but had some
unintended consequences because we did not understand how ruote was
interacting with the table. That being said it is probably time to
revisit the approach.

We tend to workflow things like projects, invoices, request. What we
ultimately want to end up with is something like:

Clase Project < ActiveRecord::Base
  :has_many :processes
  :has_many :workitems thru => processes
end

But we don't want to "contaminate" ruotes data structures with our
business logic/data. It might be interesting to pass a model name and
id into ruote when a process or workitem is created and extend the
ruote-sequal structure so that you could create a polymorfic relation
to other tables. Currently we pass the relation data into the process/
workitem but we have to dig it out of the hash.

Thank for the help and the participant lesson.

Eric Smith








On May 25, 9:28 am, John Mettraux <[email protected]> wrote:
> On Wed, May 25, 2011 at 06:57:38AM -0700, eric smith wrote:
>
> > (...)
>
> > The error condition itself is that we have a process with a workitem
> > in position 0 but the workitem does not exist (e.g. value of
> > stored_workitem is []). No errors are generated for the process. I
> > noticed in ruote-kit that there is some logic to handle the dispay of
> > this condition ( processes.html.haml ).
>
> >       %td
> >           - process.position.each do |pos|
> >             - stored_wi = process.stored_workitems.find { |wi|
> > wi.fei.sid == pos[0] }
> >             - text = "#{pos[1]} #{pos[2]['task']}"
> >             - if stored_wi
> >               = alink(:workitems, stored_wi.fei.sid, :text => text)
> >             - else
> >               &= text
>
> Hello Eric,
>
> this is a generic case, not all participants are storage participants, not 
> having a stored workitem is rather the norm, while having one is, well, a 
> sign you're using a variation of a storage participant.
>
> (see further down for more about this).
>
> > The net effect is that the process fails silently.
>
> Sorry, it's not failing [silently], it's waiting for a reply from the 
> participant.
>
> (upon re-reading, I realize you store via super(workitem) after the mailing 
> step..., what don't you put a timeout around the mailer ?)
>
> > We are still
> > looking for the root cause of the defect and why it is cascading thru
> > ruote in this way.
>
> Sorry again, it is not cascading.
>
> > But the interesting thing is that the process is
> > now hung, Obviously writing better code for the participant is the
> > best solution but should ruote know that it is carrying around a dead
> > process?
>
> What about using a timeout ?
>
>   participant :ref => 'toto', :timeout => '2h'
>   participant :ref => 'toto', :timeout => '2h', :on_timeout => 'error'
>
>   sequence :timeout => '2h' do
>     participant 'alfred'
>     participant 'bob'
>   end
>
>  http://ruote.rubyforge.org/common_attributes.html#timeout
>  http://ruote.rubyforge.org/common_attributes.html#on_timeout
>
>   concurrence :count => 1, :remaining => :cancel do
>     sequence do
>       alfred
>       bob
>     end
>     sequence do
>       wait '3h'
>       echo "time out..."
>     end
>   end
>
> > I see a need to be able to validate storage and its processes to
> > ensure that there is not a process in a hung state or a likely hung
> > state. Most of our participant should last less than a minute so it
> > should be pretty easy to build a monitor to look for long/hung
> > workitems. or processes that have no valid workitems. I think we can
> > extend ruote-kit to provide a storage status page to give some
> > rudimentary information about how ruote is feeling.
>
> Ruote need your help to answer the question "is that process hung ?". By 
> itself it cannot answer that question. "I'm just waiting for the participant 
> to reply".
>
> Hence the "timeout".
>
> Note that you can forego using timeout and periodically "poke" your hung 
> processes.
>
> See
>
>  http://ruote.rubyforge.org/process_administration.html
>
> especially
>
>  http://ruote.rubyforge.org/process_administration.html#re_applying_st...
>
> A variation
>
>   engine.launch_single(Ruote.define 'unstucker' do
>     cron '5 0 * * *' do # every night, five minutes after midnight
>       participant 'process_unstucker' # or something like that
>     end
>   end)
>
> > So my questions:
>
> > 1) What is the best  ( or most definitive) way to determine if a
> > participant is currently consuming or canceling a participant. It does
> > not seem  like the participant state is recorded in the process.
>
> The [participant] expression itself has a state. It's an attribute, during 
> normal operation its value is nil. When an expression is getting cancelled, 
> its value is "cancelling". There is also "failed" and "timing_out".
>
>   engine.process(wfid).expressions.each do |exp|
>     p [ exp.fei.to_s, exp.state ]
>   end
>
> > 2) We are mirroring workitems to a active record model which cause us
> > to go thru a  few extra gyrations, We do this by building a base
> > storage participant  that sync the workitems and having all other
> > participants inherit from it.
>
> >    Our base participant:https://gist.github.com/990909
>
> >        Is there a better way to accomplish the same thing that would
> > be more fault tolerant?
>
> Mirroring indeed, what storage are you using ? You could using 
> Ruote::StorageParticipant and only store via ActiveRecord. Or you could let 
> ruote-sequel or ruote-dm do the storage, while doing further manipulation via 
> ActiveRecord (it can read any table when handled in a persuasive way).
>
> I see nothing wrong with your actual scheme, apart from what you admitted, 
> you trusted the mailer a bit too much.
>
> It's weird that your participant goes
>
>   a) store in AR
>   b) send mail
>   c) store in storage (super(workitem))
>
> going a -> c -> b would solve the pseudo-issue mentioned at the top of this 
> reply.
>
> > 3.) Are there other obvious conditions we should look for in monitor/
> > storage validator?
>
> errors and stuck processes, I think the obvious cases are covered.
>
> > 4.) Is it possible to have a participant have an affinity for a
> > particular worker ( or the other way around)?
>
> Yes, but your process could get stuck if all the preferred workers are down.
>
> You add an #accept?(workitem) method to your participant, where you reply 
> true or false. Reply false if the current worker is not suitable.
>
>  http://ruote.rubyforge.org/implementing_participants.html#accept
>
> ...
>
> Upon final reading of your email and my reply, I think that wrapping your 
> mailer call with a timeout (and some reaction block) is a must. You know it 
> can go wrong and you know how to deal with it.
>
> Best regards,
>
> --
> John Mettraux -http://jmettraux.wordpress.com

-- 
you received this message because you are subscribed to the "ruote users" group.
to post : send email to [email protected]
to unsubscribe : send email to [email protected]
more options : http://groups.google.com/group/openwferu-users?hl=en

[ruote:3103] Re: Ruote Participant Error trapping/Monitoring

Reply via email to