[ruote:3101] Ruote Participant Error trapping/Monitoring

eric smith Wed, 25 May 2011 06:58:19 -0700

Hello all,

We are seeing an interesting error case.  Ruote will start to consume
a participant, and fail because the participant call some other method
that fails. In our case we are calling a mailer and getting an error
from the smtp server. Normally errors are caught by the participants
error handler and the participant / process can recover. In this case
the participant does not complete and it does not raise an error but
there are no workitems for the process.


The error condition itself is that we have a process with a workitem
in position 0 but the workitem does not exist (e.g. value of
stored_workitem is []). No errors are generated for the process. I
noticed in ruote-kit that there is some logic to handle the dispay of
this condition ( processes.html.haml ).

      %td
          - process.position.each do |pos|
            - stored_wi = process.stored_workitems.find { |wi|
wi.fei.sid == pos[0] }
            - text = "#{pos[1]} #{pos[2]['task']}"
            - if stored_wi
              = alink(:workitems, stored_wi.fei.sid, :text => text)
            - else
              &= text

The net effect is that the process fails silently.  We are still
looking for the root cause of the defect and why it is cascading thru
ruote in this way. But the interesting thing is that the process is
now hung, Obviously writing better code for the participant is the
best solution but should ruote know that it is carrying around a dead
process?

I see a need to be able to validate storage and its processes to
ensure that there is not a process in a hung state or a likely hung
state. Most of our participant should last less than a minute so it
should be pretty easy to build a monitor to look for long/hung
workitems. or processes that have no valid workitems. I think we can
extend ruote-kit to provide a storage status page to give some
rudimentary information about how ruote is feeling.

So my questions:
1)      What is the best  ( or most definitive) way to determine if a
participant is currently consuming or canceling a participant. It does
not seem  like the participant state is recorded in the process.

2)      We are mirroring workitems to a active record model which cause us
to go thru a  few extra gyrations, We do this by building a base
storage participant  that sync the workitems and having all other
participants inherit from it.

        Our base participant: https://gist.github.com/990909

       Is there a better way to accomplish the same thing that would
be more fault tolerant?

3.) Are there other obvious conditions we should look for in monitor/
storage validator?

4.) Is it possible to have a participant have an affinity for a
particular worker ( or the other way around)?


Thanks

Eric Smith



-- 
you received this message because you are subscribed to the "ruote users" group.
to post : send email to [email protected]
to unsubscribe : send email to [email protected]
more options : http://groups.google.com/group/openwferu-users?hl=en

[ruote:3101] Ruote Participant Error trapping/Monitoring

Reply via email to