Re: [ruote:2530] Integrating ruote with applications

Rich Meyers Wed, 21 Jul 2010 16:49:45 -0700

Hi John,

Apologies for the late reply.


> > 
> > 1. Ruote state is a black box. How do I know what workflows are running?
> 
> Hello Rich,
> 
> engine.processes
> 
> Warning, can be a costly operation depending on the storage implementation.
> 
> > What is the status of each running workflow?
> 
> engine.processes.each { |ps| p ps }
> p engine.process(wfid)

Correct me if I'm wrong but for each running workflow I can only get its wfid. 
In practice, workflows have descriptive names (like download-google-report) 
that users and scripts use. Displaying the fact that a download-google-report 
workflow launched on 1/1/2001 is currently running requires me to maintain my 
own mapping of workflow names to wfids. This is the first requirement for 
having persistent storage parallel to ruote's storage.

> 
> > Where is the list of workflows that finished?
> 
> Out of the box, there is no such list. Terminated workflows are simply 
> removed. You can write a history service to log them.

The history service may be a valid alternative. Does ruote storage support 
persisting arbitrary additional data (besides workflows/expressions)? If I'm 
going to have a history service that is more part of ruote than my own app I'd 
like it to use ruote storage.

> > 2. Waiting for workflows is not reliable.
> 
> Waiting for workflows is used when testing ruote or when showing small 
> quickstarts.
> 
> > Ruote cannot wait for workflows that finish before the wait starts.
> 
> This could help (2.1.11) :
> 
> http://github.com/jmettraux/ruote/commit/0eb09d354992e27778f87fe64d418632b4281d9c
> 

Thanks, this seems like a step in the right direction. For my purposes however 
@seen would probably need to be unbounded and persistent. Without persistent 
@seen in particular starting up a fresh process to wait for a workflow that 
finished a long time ago wouldn't work from what I can see.

> If you launch a workflow and wait for it, why use Ruote ? A simple, 
> classical, syncrhonous script would be better.

I have scripts now and they do work. The reason for investigating ruote is to 
achieve better code factoring. Workflows are actually composed of well-defined 
parts and being able to obtain status of these parts individually would be 
quite handy for example.

> 
> > 3. Ruote does not limit concurrency. If some of my workflows use external 
> > resources such as downloading files from the internet I don't want an 
> > unbounded number of these workflows running at once.
> 
> Ruby is written in Ruby. When running on MRI (C) Ruby, there is only 1 thread 
> on at a time. Ruote brings no magic concurrency to the table. For 1 worker, 
> there is only 1 workflow instance performing something at a time.

In ruby code, yes. But I'm using curb (curl binding for ruby) which allows me 
to have multiple concurrent downloads being managed by C code.

> 
> You could introduce some limitation in your "downloading files from the 
> internet" participant by yourself. The engine will be quietly waiting for 
> your participant's answer.

I wrote a semaphore implementation that works 50% of the time. The other 50% 
ruby runtime hangs itself, I suspect due to the interactions between 
activerecord and threads.

> 
> > My solution so far has been to implement my own tracking of workflows, 
> > waiting and locking. Since our application is built on rails I have used 
> > activerecord for persistence.
> 
> Do you mean you are using activerecord as the base for a ruote storage 
> implementation ?

I looked at ruote-activerecord. Its code is not compatible with current version 
of ruote. I considered making it work or trying datamapper instead. Ruby 
runtime hangs are discouraging me from trying to tie activerecord with ruote 
core.

> 
> > It seems that activerecord still does not play nice with threads though as 
> > my ruby (1.8) runtime hangs itself every other time I try to run a workflow 
> > now. For all practical purposes the application I built is unusable. I have 
> > ideas how to hack my way out of this mess but I took a step back and asked 
> > myself why I'm doing all these things in the first place.
> 
> Connection pool issues ?

Whatever the issues are, they are impossible to debug because the process only 
responds to kill -9. I'm guessing getting rid of activerecord will be the first 
debugging step.

> 
> > I considered typical ruote use cases mentioned in the documentation and on 
> > the mailing list and I suppose when dealing with human processes that are 
> > persisted externally some or all of these issues do not appear. I'm 
> > starting to think that ruote works ok for state transitions for objects 
> > persisted elsewhere but I don't see how it can effectively manage processes 
> > that only exist in ruote.
> 
> Could you please expand on that ? Ruote isn't about state/transitions. I 
> understand the part "for objects persisted elsewhere", where ruote processes 
> alter the state of objects, but I don't get the "I don't see how it (ruote) 
> can effectively manage processes that only exist in ruote".

Suppose I want to check stock prices on various exchanges every day. I have one 
workflow for each exchange that knows how to get data for a particular stock 
and parse it. I launch these workflows from cron daily.

I need to know:

- What stocks are currently being downloaded from what exchanges?
- What workflows have been running for over an hour? What step are they on 
right now?
- Were there any downloading or parsing failures in the last week? What were 
the causes?

Ideally I would like ruote to manage the workflows from start to finish. As 
such I would like ruote to tell me that currently GOOG is being retrieved from 
NY stock exchange, and yesterday all retrievals failed because connection to 
stock exchange couldn't be established.

On my site users indicate which stock quotes they want to see, and I retrieve 
only those stock quotes. But I may have a sudden spike in user activity. I want 
to limit the number of active downloads to 10 per stock exchange. I don't want 
to limit the number of stock quotes that are being parsed since it's a 
relatively quick operation. I also don't know how long each download would 
take, and I want everything to be downloaded as soon as possible after 
scheduled downloads begin. Ideally I want to submit a potentially huge list of 
stocks to ruote and have it only invoke 10 download participants at a time.

And, again, all of this should be persisted in ruote storage.

Normally workflows are asynchronous in the sense that the cron script launches 
them and does not wait for completion. A separate cron job would check for 
errors daily after some suitable interval. However, during development I want 
to run the workflows synchronously so that if any part of them fails I find out 
about the error as soon as possible. Thus I need to be able to run the 
workflows both synchronously and asynchronously.

Rich

-- 
you received this message because you are subscribed to the "ruote users" group.
to post : send email to [email protected]
to unsubscribe : send email to [email protected]
more options : http://groups.google.com/group/openwferu-users?hl=en

Re: [ruote:2530] Integrating ruote with applications

Reply via email to