[ruote:3306] Re: Change storage implementations in production and other questions :)

Nathan Fri, 18 Nov 2011 16:56:50 -0800

Thank you for the detailed response. I think I'm being too aggressive
with performance goals, so I'm going to deploy things as I have them
now and see how it runs in production. Our load tests were simulating
substantially more concurrent users than we really have, so I think
we'll be ok for some time (crossing my fingers anyway).


I do agree that the design of ruote is heavily focused on reliability
and consistency over speed, which is definitely what you want in the
kind of system ruote was designed for. While in the future it would be
nice if ruote could be optionally tuned for speed at the cost of
consistency in the face of any kind of failure specifically for real
time user facing types of systems where, and if that day ever comes I
have some thoughts, but I think we can get by just fine with a couple
more worker processes and the tweaks I've already made to the mongo
storage.

You're comment about BSON vs JSON did make me realize that I was
needlessly serializing documents to JSON, or using Rufus::Json.dup,
even though they would then be serialized by the mongo driver into
BSON, so I quit doing that. It caused a few minor problems, which you
can see from this workaround method:

https://gist.github.com/1378141

Also, one unit test is failing for the mongo driver, but I'm not the
rest must be so string as the functional tests pass:

test/unit/storage.rb:167
<"z"> expected but was <:z>

BSON will pull out a symbol if it puts in a symbol. For document
values (not keys) is this really an issue for ruote?

Finally, as a side note, I did adapt the mongo driver to use em-mongo
and evented IO, and I wrote an asynchronous worker to take advantage
of that, but it performed much much worse than serial IO. I believe
this is because of the vast number of writes that require locking, so
with the high speed loop I'm thinking lock contention combined with
the high CPU load was too much.

As for the mongo storage, I would love to go through the code with
you. I'm sure it could be made much better. I'm not sure how we would
do it, but I'm very open to it any time you like.

Thanks again for your help. I may have said this before, but Ruote is
the single best supported open source project I've used. I don't know
how you find the time, but your responsiveness, detail and patience
are really a very strong feature of Ruote.

Nathan



On Nov 16, 8:02 pm, John Mettraux <[email protected]> wrote:
> On Wed, Nov 16, 2011 at 06:07:13PM -0800, Nathan wrote:
>
> > Hi John. We've been hammering at this all week. We updated our MongoDB
> > adapter to fix the schedule loop, made some adjustments to make it a
> > bit faster and introduced a locking scheme for multi-worker
> > concurrency. We tried the Redis storage but for some reason it wasn't
> > processing all of our messages during our load tests, could be user
> > error, but in any case we're sticking with the Mongo one for now even
> > though I think it is probably somewhat slower.
>
> > When loaded up with a number of simultaneous, large workflow launches
> > that produce a number (8-10) additional work items things are still
> > pretty slow. I noticed that the slowest workflows to go from launch to
> > equilibrium have a large number of "set" expressions to set variables
> > and fields.
>
> Hello Nathan,
>
> you could cut down the number of those initial "sets", by passing the initial
> workitem fields and variables when launching:
>
>   fields = { 'a' => 'b' }
>   variables = { 'c' => 'd' }
>   wfid = dashboard.launch(pdef, fields, variables)
>
> There is also the set_fields expression which lets you set a batch a fields
> with one expression:
>
>   set_fields :val => { 'customer' => { 'name' => 'Fred', 'age' => 40 } }
>
> It's documented at:
>
>  http://ruote.rubyforge.org/exp/restore.html
>
> There is no "set_variables" expression, should you need one, it's very easy
> to implement.
>
> > We also have a lot of participant (and other) expressions
> > that are conditional using "if" and are usually skipped. ï¿½We have used
> > these pretty liberally in our workflow code.
>
> You're right, 'if' and 'unless' are evaluated right after the expression got
> persisted initially. IIRC I do it this way in order to have simpler code (and
> also in order to have the expression at hand in case of error in the
> if/unless (a target for replay_at_error).
>
> I want people to use if/unless liberally, it makes code much more readable.
> Eventually, if it proves a big perf sink, we could optimize that and forgo
> the initial persist, eval and reply immediately.
>
> > In profiling, it turns out that nearly each variable set *appears* to
> > cause the process to persist via a "put".
>
> Yes, each time you save a variable that results in having the expression
> holding the variable getting persisted.
>
> > I think I can mitigate this
> > to some extent by combining evented IO via event machine with writing
> > a worker implementation that puts message dispatch into a push-fed EM
> > event loop (instead of the standard polling loop), but I get the
> > feeling the JSON serialization / de-serialization cost is adding up,
> > and that of course is CPU bound.
>
> Yes, this is costly, I'm using YAJL, it's even faster than Marshal for some
> version of Rubies (I have to test with 1.9.2+).
>
> > If I modify the ruote code to force
> > 'should_persist' to false in 'un_set_variable' the difference in
> > performance is dramatic, but I bet my tests wouldn't pass that way,
> > although I'm unsure of the ramifications actually.
>
> Let's look at that code
>
> ---8<---
> 01 def un_set_variable(op, var, val, should_persist)
> 02
> 03   if op == :set
> 04     Ruote.set(h.variables, var, val)
> 05   else # op == :unset
> 06     Ruote.unset(h.variables, var)
> 07   end
> 08
> 09   if should_persist && r = try_persist # persist failed, have to retry
> 10
> 11     @h = r
> 12     un_set_variable(op, var, val, true)
> 13
> 14   else # success (even when should_persist == false)
> 15
> 16     @context.storage.put_msg("variable_#{op}", 'var' => var, 'fei' => 
> h.fei)
> 17   end
> 18 end
> --->8---
>
> If should_persist is false, it means the variable/value binding is not saved.
>
> If it yields a great increase in performance, it probably means #try_persist
> tends to be unsucessful and triggers a recursion. You can verify that by
> placing some logging output at line 10 and observe what happens.
>
> should_persist is set to false, by some expressions that set series of
> variables (or other expression attributes) and then persist.
>
> > My question is about when ruote decides it needs persist? My guess
> > would have been that persistence only occurs just prior to unloading a
> > workflow process because all paths have led to a dead end requiring
> > external stimulus, but that doesn't seem to be the case.
>
> Ruote "implements" concurrency by splitting workflow instances into
> expressions. A concurrence expression places an "apply" message for each of
> its children on the work queue, and if you have multiple workers, each of
> those children may end up getting applied by a different worker.
>
> When an external answer comes back for a participant, only the
> required participant expression is fetched back from the storage
>
> There is no concept of loading all the expessions of a workflow instance and
> then saving them all. Although one could imagine implementing a storage that
> does that (in fact I have already done that, but it's proprietary software),
> but it requires some warranties that only one worker is processing the
> messages for a give workflow instance at any time.
>
> I've been striving for the simplest possible concepts and you're probably
> hitting a limitiation of my design or there is something wrong with your
> storage implementation (most likely both + some inefficiencies in ruote
> certainly). Fortunately, we can measure.
>
> (It'd be interesting to know the cost of persisting one big expression, maybe
> you could log that (with exp size) from the storage implementation).
>
> (Could we forgo JSON and use BSON hashes directly ? Crazy question, don't
> know if it makes sense)
>
> > We have a lot of business rules modeled using flow expressions and variable
> > sets, as well as a lot of conditional participant expressions, and I figured
> > these were probably nearly free from a performance perspective. If
> > this is not the case though, for instance if these branches and
> > setting of variables are actually causing ruote to save the document
> > and put a continuation on the message queue, we may need to refactor
> > our workflows to put all those business rule calculations into
> > external helpers.
>
> Ruote tends to save as soon as possible so that it doesn't get caught with
> inconsistent workflow instances. Persistence always has priority over
> performance (hence the escape to multiple workers).
>
> > I will say this though: digging through ruote's code and tests is
> > teaching me a lot. Reading good code is always such a rewarding
> > experience.
>
> Thanks :-) Maybe I could have a look at the latest version of ruote-mongodb
> with you.
>
> Please don't hesitate to hammer the list with questions, the replies I just
> wrote are probably not sufficient. I'm intrigued by your should_persist
> finding.
>
> I'd be happy if we/you found inconsistencies/inefficiencies in ruote with
> this work.
>
> Cheers,
>
> --
> John Mettraux -http://lambda.io/processi

-- 
you received this message because you are subscribed to the "ruote users" group.
to post : send email to [email protected]
to unsubscribe : send email to [email protected]
more options : http://groups.google.com/group/openwferu-users?hl=en

[ruote:3306] Re: Change storage implementations in production and other questions :)

Reply via email to