Hello,
I am pretty new to Ruote and I would like some suggestions/guidelines about how
to develop an architecture for the scenario I am going to describe. Pointers to
existing code would also be appreciated.
A lab needs to run automatic and computationally expensive data analyses (which
can last days or even weeks); each such task is the execution of one or more
command-line tools, currently launched manually from a shell. Some tasks must
run sequentially and some other may run concurrently. When a task is over,
typically some manual inspection of the output must be carried out to decide
whether the workflow can proceed to the next (time-expensive) analysis.
Sometimes (more often than not), tasks are not completed because an error
occurs
(for example, the task hits a memory limit and it is killed by the operating
system), so a human agent must decide what to do (relaunch, cancel, start a
different workflow, etc...). The reason I'd like to use Ruote for this instead
of some job scheduler is that there is a non-linear mix of computer and human
tasks to be performed, and a job scheduler is not flexible enough (in
particular, it does not handle the human part).
The main problem I am facing is: what is the best way to execute the computer
tasks in separate (Ruby) processes, and start/stop them and track their status,
say, from a web interface à la ruote-kit? As far as I can see, Ruote can spawn
a
new thread when it hands a workitem to a participant, but not a new (Ruby)
process. Who should be responsible for spawning a separate Ruby process? Should
it be done in the participant? Or in the “main” program? Or should I use some
client-server architecture?
I have a feeling that workers play a role here, but how they... work (ehm) is
not that clear to me yet. I have read both the Ruote-Kit Readme and the blog
post about Ruote 2.1, but what code like this does
storage = Ruote::FsStorage.new('ruote_work')
worker = Ruote::Worker.new(storage)
worker.run
# current thread is now running worker
and whether it is self-contained is still a mystery to me (how does this know
what to look for in the storage? What does it “run”?). When I run it, a script
like the above either gets stuck or it gives an error like “no JSON backend
found” (in the case of ruote-kit). I would be glad if someone could help me
make
my mind clear on these issues.
And if I am allowed to abuse your patience a bit more, I have also a couple of
specific, and probably naive, questions:
1) if I define a participant by subclassing Ruote::StorageParticipant, do I
still need to mixin Ruote::LocalParticipant?
(2) In many examples of process definitions, participants are passed a :task
parameter. Which makes me wonder: does that have a special meaning in Ruote? I
have never seen an example of a participant implementation making use of the
:task parameter, so I have always assumed that it is a name like another and
have used it as follows:
class MyParticipant
include Ruote::LocalParticipant
def consume(workitem)
case workitem.params[:task]
when 'do this' then dothis(workitem)
when 'do that' then dothat(workitem)
end
end
private
def dothis(wi)
[...]
end
def dothat(wi)
[...]
end
end
Is this the intended usage pattern?
Nicola
--
you received this message because you are subscribed to the "ruote users" group.
to post : send email to [email protected]
to unsubscribe : send email to [email protected]
more options : http://groups.google.com/group/openwferu-users?hl=en