Hi John. We've been hammering at this all week. We updated our MongoDB adapter to fix the schedule loop, made some adjustments to make it a bit faster and introduced a locking scheme for multi-worker concurrency. We tried the Redis storage but for some reason it wasn't processing all of our messages during our load tests, could be user error, but in any case we're sticking with the Mongo one for now even though I think it is probably somewhat slower. When loaded up with a number of simultaneous, large workflow launches that produce a number (8-10) additional work items things are still pretty slow. I noticed that the slowest workflows to go from launch to equilibrium have a large number of "set" expressions to set variables and fields. We also have a lot of participant (and other) expressions that are conditional using "if" and are usually skipped. We have used these pretty liberally in our workflow code. In profiling, it turns out that nearly each variable set *appears* to cause the process to persist via a "put". I think I can mitigate this to some extent by combining evented IO via event machine with writing a worker implementation that puts message dispatch into a push-fed EM event loop (instead of the standard polling loop), but I get the feeling the JSON serialization / de-serialization cost is adding up, and that of course is CPU bound. If I modify the ruote code to force 'should_persist' to false in 'un_set_variable' the difference in performance is dramatic, but I bet my tests wouldn't pass that way, although I'm unsure of the ramifications actually. My question is about when ruote decides it needs persist? My guess would have been that persistence only occurs just prior to unloading a workflow process because all paths have led to a dead end requiring external stimulus, but that doesn't seem to be the case. We have a lot of business rules modeled using flow expressions and variable sets, as well as a lot of conditional participant expressions, and I figured these were probably nearly free from a performance perspective. If this is not the case though, for instance if these branches are setting of variables are actually causing ruote to save the document and put a continuation on the message queue, we may need to refactor our workflows to put all those business rule calculations into external helpers. I will say this though: digging through ruote's code and tests is teaching me a lot. Reading good code is always such a rewarding experience. Thank you so much for your time, Nathan
On Nov 14, 12:41 pm, John Mettraux <[email protected]> wrote: > On Mon, Nov 14, 2011 at 09:14:00AM -0800, Nathan Stults wrote: > > > John, thank you for all the pointers. Today we will set up a test > > environment to take measurements and apply some realistic loads and take > > a closer look at all the points you mentioned. One question on the > > schedules - if the behavior of a worker is to pull all schedules and > > fire triggered ones, how does this work in a multi-worker environment? > > Is that what "reserve" is used for in the storage? (We haven't > > implemented reserve in MongoDB, but probably should) > > Hello, > > yes Storage#reserve(doc) is meant to return true if the worker has > successfully reserved the document for its own use. It's very important for > multi-worker storages to implement this method correctly. If it returns true > twice for the same doc (msg or schedule) you'll end up with a workflow > operation being performed twice (branches popping out of nowhere) and > schedules triggering twice. > > Maybe simply fixing #get_schedules will yield sufficient gain so that you can > stick with one worker. We'll see. > > Best regards, > > -- > John Mettraux -http://lambda.io/processi -- you received this message because you are subscribed to the "ruote users" group. to post : send email to [email protected] to unsubscribe : send email to [email protected] more options : http://groups.google.com/group/openwferu-users?hl=en
