[ruote:3433] Ruote to orchestrate time-expensive computer processes?

Nicola Tue, 28 Feb 2012 08:56:26 -0800

Hello,
I am pretty new to Ruote and I would like some suggestions/guidelines about how 
to develop an architecture for the scenario I am going to describe. Pointers to 
existing code would also be appreciated.


A lab needs to run automatic and computationally expensive data analyses (which 
can last days or even weeks); each such task is the execution of one or more 
command-line tools, currently launched manually from a shell. Some tasks must 
run sequentially and some other may run concurrently. When a task is over, 
typically some manual inspection of the output must be carried out to decide 
whether the workflow can proceed to the next (time-expensive) analysis. 
Sometimes (more often than not), tasks are not completed because an error 
occurs 
(for example, the task hits a memory limit and it is killed by the operating 
system), so a human agent must decide what to do (relaunch, cancel, start a 
different workflow, etc...). The reason I'd like to use Ruote for this instead 
of some job scheduler is that there is a non-linear mix of computer and human 
tasks to be performed, and a job scheduler is not flexible enough (in 
particular, it does not handle the human part).

The main problem I am facing is: what is the best way to execute the computer 
tasks in separate (Ruby) processes, and start/stop them and track their status, 
say, from a web interface à la ruote-kit? As far as I can see, Ruote can spawn 
a 
new thread when it hands a workitem to a participant, but not a new (Ruby) 
process. Who should be responsible for spawning a separate Ruby process? Should 
it be done in the participant? Or in the “main” program? Or should I use some 
client-server architecture?

I have a feeling that workers play a role here, but how they... work (ehm) is 
not that clear to me yet. I have read both the Ruote-Kit Readme and the blog 
post about Ruote 2.1, but what code like this does

storage = Ruote::FsStorage.new('ruote_work')
worker = Ruote::Worker.new(storage)
worker.run
  # current thread is now running worker

and whether it is self-contained is still a mystery to me (how does this know 
what to look for in the storage? What does it “run”?). When I run it, a script 
like the above either gets stuck or it gives an error like “no JSON backend 
found” (in the case of ruote-kit). I would be glad if someone could help me 
make 
my mind clear on these issues.


And if I am allowed to abuse your patience a bit more, I have also a couple of 
specific, and probably naive, questions:

1) if I define a participant by subclassing Ruote::StorageParticipant, do I 
still need to mixin Ruote::LocalParticipant?

(2) In many examples of process definitions, participants are passed a :task 
parameter. Which makes me wonder: does that have a special meaning in Ruote? I 
have never seen an example of a participant implementation making use of the 
:task parameter, so I have always assumed that it is a name like another and 
have used it as follows:

class MyParticipant
  include Ruote::LocalParticipant

  def consume(workitem)
    case workitem.params[:task]
    when 'do this' then dothis(workitem)
    when 'do that' then dothat(workitem)
    end
  end

private
  def dothis(wi)
    [...]
  end

  def dothat(wi)
    [...]
  end
end

Is this the intended usage pattern?

Nicola

-- 
you received this message because you are subscribed to the "ruote users" group.
to post : send email to [email protected]
to unsubscribe : send email to [email protected]
more options : http://groups.google.com/group/openwferu-users?hl=en

[ruote:3433] Ruote to orchestrate time-expensive computer processes?

Reply via email to