Hi Nicola,

I'm working on a similar system as you described above. I'm using
RabbitMQ to offload
all computation to discrete and standalone processors.

Architectural wise, I'm doing the following

1, Rails backed RESTful API to allow clients launching jobs
2, Received job gets translated into Ruote specific process
definitions (AST)
3, I run Ruote using a REDIS backed storage, which allows me to attach
multiple workers
4, RuoteAMQP remote participants are heavily used, so that Ruote
worker will pickup an
expression, and dispatch the job to RabbitMQ
5, A number of discrete/standalone processors (implemented using
DaemonKit) subscribe to
designated RabbitMQ queues. They will do their work (could take min/
hours/even longer,
doesn't matter), and return the finished workitem back to Ruote, so
that the process
will continue.
6, Any errors occured will raise exception within Ruote, so that the
whole job will be on_error
and paused
7, I build an admin interface (my human participant) to monitor all
the logged and launched jobs.
It allows me to see which remote participant fails and give me a
chance to fix the participant and
re_apply the errored job

Nicola, hope this helps you in some ways. Unfortunately what I'm
building is close sourced, I cannot disclose
anymore details.

As something for myself, John, does what I describe make sense to you
at all? Am I over-complicating things?
Or there are aspects from Ruote, which I simply missed, that could
simplify things a little? Thanks John!

On Feb 29, 3:55 am, Nicola <[email protected]> wrote:
> Hello,
> I am pretty new to Ruote and I would like some suggestions/guidelines about 
> how
> to develop an architecture for the scenario I am going to describe. Pointers 
> to
> existing code would also be appreciated.
>
> A lab needs to run automatic and computationally expensive data analyses 
> (which
> can last days or even weeks); each such task is the execution of one or more
> command-line tools, currently launched manually from a shell. Some tasks must
> run sequentially and some other may run concurrently. When a task is over,
> typically some manual inspection of the output must be carried out to decide
> whether the workflow can proceed to the next (time-expensive) analysis.
> Sometimes (more often than not), tasks are not completed because an error 
> occurs
> (for example, the task hits a memory limit and it is killed by the operating
> system), so a human agent must decide what to do (relaunch, cancel, start a
> different workflow, etc...). The reason I'd like to use Ruote for this instead
> of some job scheduler is that there is a non-linear mix of computer and human
> tasks to be performed, and a job scheduler is not flexible enough (in
> particular, it does not handle the human part).
>
> The main problem I am facing is: what is the best way to execute the computer
> tasks in separate (Ruby) processes, and start/stop them and track their 
> status,
> say, from a web interface à la ruote-kit? As far as I can see, Ruote can 
> spawn a
> new thread when it hands a workitem to a participant, but not a new (Ruby)
> process. Who should be responsible for spawning a separate Ruby process? 
> Should
> it be done in the participant? Or in the “main” program? Or should I use some
> client-server architecture?
>
> I have a feeling that workers play a role here, but how they... work (ehm) is
> not that clear to me yet. I have read both the Ruote-Kit Readme and the blog
> post about Ruote 2.1, but what code like this does
>
> storage = Ruote::FsStorage.new('ruote_work')
> worker = Ruote::Worker.new(storage)
> worker.run
>   # current thread is now running worker
>
> and whether it is self-contained is still a mystery to me (how does this know
> what to look for in the storage? What does it “run”?). When I run it, a script
> like the above either gets stuck or it gives an error like “no JSON backend
> found” (in the case of ruote-kit). I would be glad if someone could help me 
> make
> my mind clear on these issues.
>
> And if I am allowed to abuse your patience a bit more, I have also a couple of
> specific, and probably naive, questions:
>
> 1) if I define a participant by subclassing Ruote::StorageParticipant, do I
> still need to mixin Ruote::LocalParticipant?
>
> (2) In many examples of process definitions, participants are passed a :task
> parameter. Which makes me wonder: does that have a special meaning in Ruote? I
> have never seen an example of a participant implementation making use of the
> :task parameter, so I have always assumed that it is a name like another and
> have used it as follows:
>
> class MyParticipant
>   include Ruote::LocalParticipant
>
>   def consume(workitem)
>     case workitem.params[:task]
>     when 'do this' then dothis(workitem)
>     when 'do that' then dothat(workitem)
>     end
>   end
>
> private
>   def dothis(wi)
>     [...]
>   end
>
>   def dothat(wi)
>     [...]
>   end
> end
>
> Is this the intended usage pattern?
>
> Nicola

-- 
you received this message because you are subscribed to the "ruote users" group.
to post : send email to [email protected]
to unsubscribe : send email to [email protected]
more options : http://groups.google.com/group/openwferu-users?hl=en

Reply via email to