Re: [ruote:3299] Re: Change storage implementations in production and other questions :)

John Mettraux Wed, 16 Nov 2011 20:04:37 -0800

On Wed, Nov 16, 2011 at 06:07:13PM -0800, Nathan wrote:
>
> Hi John. We've been hammering at this all week. We updated our MongoDB
> adapter to fix the schedule loop, made some adjustments to make it a
> bit faster and introduced a locking scheme for multi-worker
> concurrency. We tried the Redis storage but for some reason it wasn't
> processing all of our messages during our load tests, could be user
> error, but in any case we're sticking with the Mongo one for now even
> though I think it is probably somewhat slower.
>
> When loaded up with a number of simultaneous, large workflow launches
> that produce a number (8-10) additional work items things are still
> pretty slow. I noticed that the slowest workflows to go from launch to
> equilibrium have a large number of "set" expressions to set variables
> and fields.


Hello Nathan,

you could cut down the number of those initial "sets", by passing the initial
workitem fields and variables when launching:

  fields = { 'a' => 'b' }
  variables = { 'c' => 'd' }
  wfid = dashboard.launch(pdef, fields, variables)

There is also the set_fields expression which lets you set a batch a fields
with one expression:

  set_fields :val => { 'customer' => { 'name' => 'Fred', 'age' => 40 } }

It's documented at:

  http://ruote.rubyforge.org/exp/restore.html

There is no "set_variables" expression, should you need one, it's very easy
to implement.

> We also have a lot of participant (and other) expressions
> that are conditional using "if" and are usually skipped.  We have used
> these pretty liberally in our workflow code.

You're right, 'if' and 'unless' are evaluated right after the expression got
persisted initially. IIRC I do it this way in order to have simpler code (and
also in order to have the expression at hand in case of error in the
if/unless (a target for replay_at_error).

I want people to use if/unless liberally, it makes code much more readable.
Eventually, if it proves a big perf sink, we could optimize that and forgo
the initial persist, eval and reply immediately.

> In profiling, it turns out that nearly each variable set *appears* to
> cause the process to persist via a "put".

Yes, each time you save a variable that results in having the expression
holding the variable getting persisted.

> I think I can mitigate this
> to some extent by combining evented IO via event machine with writing
> a worker implementation that puts message dispatch into a push-fed EM
> event loop (instead of the standard polling loop), but I get the
> feeling the JSON serialization / de-serialization cost is adding up,
> and that of course is CPU bound.

Yes, this is costly, I'm using YAJL, it's even faster than Marshal for some
version of Rubies (I have to test with 1.9.2+).

> If I modify the ruote code to force
> 'should_persist' to false in 'un_set_variable' the difference in
> performance is dramatic, but I bet my tests wouldn't pass that way,
> although I'm unsure of the ramifications actually.

Let's look at that code

---8<---
01 def un_set_variable(op, var, val, should_persist)
02
03   if op == :set
04     Ruote.set(h.variables, var, val)
05   else # op == :unset
06     Ruote.unset(h.variables, var)
07   end
08
09   if should_persist && r = try_persist # persist failed, have to retry
10
11     @h = r
12     un_set_variable(op, var, val, true)
13
14   else # success (even when should_persist == false)
15
16     @context.storage.put_msg("variable_#{op}", 'var' => var, 'fei' => h.fei)
17   end
18 end
--->8---

If should_persist is false, it means the variable/value binding is not saved.

If it yields a great increase in performance, it probably means #try_persist
tends to be unsucessful and triggers a recursion. You can verify that by
placing some logging output at line 10 and observe what happens.

should_persist is set to false, by some expressions that set series of
variables (or other expression attributes) and then persist.

> My question is about when ruote decides it needs persist? My guess
> would have been that persistence only occurs just prior to unloading a
> workflow process because all paths have led to a dead end requiring
> external stimulus, but that doesn't seem to be the case.

Ruote "implements" concurrency by splitting workflow instances into
expressions. A concurrence expression places an "apply" message for each of
its children on the work queue, and if you have multiple workers, each of
those children may end up getting applied by a different worker.

When an external answer comes back for a participant, only the
required participant expression is fetched back from the storage

There is no concept of loading all the expessions of a workflow instance and
then saving them all. Although one could imagine implementing a storage that
does that (in fact I have already done that, but it's proprietary software),
but it requires some warranties that only one worker is processing the
messages for a give workflow instance at any time.

I've been striving for the simplest possible concepts and you're probably
hitting a limitiation of my design or there is something wrong with your
storage implementation (most likely both + some inefficiencies in ruote
certainly). Fortunately, we can measure.

(It'd be interesting to know the cost of persisting one big expression, maybe
you could log that (with exp size) from the storage implementation).

(Could we forgo JSON and use BSON hashes directly ? Crazy question, don't
know if it makes sense)

> We have a lot of business rules modeled using flow expressions and variable
> sets, as well as a lot of conditional participant expressions, and I figured
> these were probably nearly free from a performance perspective. If
> this is not the case though, for instance if these branches and
> setting of variables are actually causing ruote to save the document
> and put a continuation on the message queue, we may need to refactor
> our workflows to put all those business rule calculations into
> external helpers.

Ruote tends to save as soon as possible so that it doesn't get caught with
inconsistent workflow instances. Persistence always has priority over
performance (hence the escape to multiple workers).

> I will say this though: digging through ruote's code and tests is
> teaching me a lot. Reading good code is always such a rewarding
> experience.

Thanks :-) Maybe I could have a look at the latest version of ruote-mongodb
with you.

Please don't hesitate to hammer the list with questions, the replies I just
wrote are probably not sufficient. I'm intrigued by your should_persist
finding.

I'd be happy if we/you found inconsistencies/inefficiencies in ruote with
this work.


Cheers,

--
John Mettraux - http://lambda.io/processi

-- 
you received this message because you are subscribed to the "ruote users" group.
to post : send email to [email protected]
to unsubscribe : send email to [email protected]
more options : http://groups.google.com/group/openwferu-users?hl=en

Re: [ruote:3299] Re: Change storage implementations in production and other questions :)

Reply via email to