[ruote:3298] Re: Change storage implementations in production and other questions :)

Nathan Wed, 16 Nov 2011 18:07:17 -0800

Hi John. We've been hammering at this all week. We updated our MongoDB
adapter to fix the schedule loop, made some adjustments to make it a
bit faster and introduced a locking scheme for multi-worker
concurrency. We tried the Redis storage but for some reason it wasn't
processing all of our messages during our load tests, could be user
error, but in any case we're sticking with the Mongo one for now even
though I think it is probably somewhat slower.
When loaded up with a number of simultaneous, large workflow launches
that produce a number (8-10) additional work items things are still
pretty slow. I noticed that the slowest workflows to go from launch to
equilibrium have a large number of "set" expressions to set variables
and fields. We also have a lot of participant (and other) expressions
that are conditional using "if" and are usually skipped.  We have used
these pretty liberally in our workflow code.
In profiling, it turns out that nearly each variable set *appears* to
cause the process to persist via a "put". I think I can mitigate this
to some extent by combining evented IO via event machine with writing
a worker implementation that puts message dispatch into a push-fed EM
event loop (instead of the standard polling loop), but I get the
feeling the JSON serialization / de-serialization cost is adding up,
and that of course is CPU bound. If I modify the ruote code to force
'should_persist' to false in 'un_set_variable' the difference in
performance is dramatic, but I bet my tests wouldn't pass that way,
although I'm unsure of the ramifications actually.
My question is about when ruote decides it needs persist? My guess
would have been that persistence only occurs just prior to unloading a
workflow process because all paths have led to a dead end requiring
external stimulus, but that doesn't seem to be the case. We have a lot
of business rules modeled using flow expressions and variable sets, as
well as a lot of conditional participant expressions, and I figured
these were probably nearly free from a performance perspective. If
this is not the case though, for instance if these branches are
setting of variables are actually causing ruote to save the document
and put a continuation on the message queue, we may need to refactor
our workflows to put all those business rule calculations into
external helpers.
I will say this though: digging through ruote's code and tests is
teaching me a lot. Reading good code is always such a rewarding
experience.
Thank you so much for your time,
Nathan


On Nov 14, 12:41 pm, John Mettraux <[email protected]> wrote:
> On Mon, Nov 14, 2011 at 09:14:00AM -0800, Nathan Stults wrote:
>
> > John, thank you for all the pointers. Today we will set up a test
> > environment to take measurements and apply some realistic loads and take
> > a closer look at all the points you mentioned. One question on the
> > schedules - if the behavior of a worker is to pull all schedules and
> > fire triggered ones, how does this work in a multi-worker environment?
> > Is that what "reserve" is used for in the storage? (We haven't
> > implemented reserve in MongoDB, but probably should)
>
> Hello,
>
> yes Storage#reserve(doc) is meant to return true if the worker has 
> successfully reserved the document for its own use. It's very important for 
> multi-worker storages to implement this method correctly. If it returns true 
> twice for the same doc (msg or schedule) you'll end up with a workflow 
> operation being performed twice (branches popping out of nowhere) and 
> schedules triggering twice.
>
> Maybe simply fixing #get_schedules will yield sufficient gain so that you can 
> stick with one worker. We'll see.
>
> Best regards,
>
> --
> John Mettraux -http://lambda.io/processi

-- 
you received this message because you are subscribed to the "ruote users" group.
to post : send email to [email protected]
to unsubscribe : send email to [email protected]
more options : http://groups.google.com/group/openwferu-users?hl=en

[ruote:3298] Re: Change storage implementations in production and other questions :)

Reply via email to