On Tue, Dec 18, 2012 at 08:13:14PM +0100, Mario Camou wrote:
>
> We're having some performance issues in a specific workflow, where Ruote
> will sometimes start going very slow (i.e., 1-2 seconds between messages
> with RUOTE_NOISY=true.

Hello Mario,

is that specific workflow long? How many lines in the process definitions?
Do the workitems have a big payload?

> Part of the problem seems to be memory consumption (we're running on JRuby
> 1.7 on top of JDK 7). With this single workflow running, we were running
> out of heap space (set to 640M). We upped the heap to 768M and moved
> from DefaultHistory to StorageHistory (we're using RuoteMon for storage).
> That allowed us to complete the workflow without having an
> OutOfMemoryError. However, using JConsole to monitor memory in real time I
> see that memory usage will spike to almost all the heap, and then it seems
> like the JVM is thrashing, doing a GC (which takes it down to ~500M) and
> then filling back up, every 4 seconds.
>
> With the Eclipse Heap Analyzer I've seen that the biggest culprit is the
> WaitLogger. The comments in the code say that it stores the last 147
> messages for this particular worker, but it isn't clear exactly what it
> does and what might happen if we bump that number down.

Yes, feel free to cut down the 'wait_logger_max' to 0. It should have no bad
effect, unless you use Ruote::Dashboard#wait_for in production (outside of
tests).

To avoid such bad surprises I cut the default from 147 to 56 on ruote master:

  
https://github.com/jmettraux/ruote/commit/327a64f6f7afed685d108ffe425c6651ecb83914

Please tell me how it goes.

> Another thing that would be useful is to have a description of the columns
> that appear in the noisy log. So, for example:
>
> 9 00:23.101 20         di * 20121218-1829-jimisuzo-hetsujemo 60311 0_10_1_2
> fill_form wi: [0_10_1_2!60311...!, 7], part:
> [Abstra::RuoteProject::Participants::PubSubAuthStorageParticipant, {}]
>
> What is the timestamp? The number after it? The abbreviation seems to be
> some sort of code specifying which part of the processing is going on (I've
> seen ap which I think is something like "accepted", re which I assume is
> "reply", di which might mean "dispatch" but there's also rc, dd, re, ce,
> la, etc. Then comes the WF name, some hex number, the expid, and a bunch of
> parameters. I'd like to know a bit more about this to dig in deeper to see
> where exactly the performance problems lie.

I've started putting a page together about that:

  http://ruote.rubyforge.org/noisy.html

Feedback is welcome.


Feliz Navidad,

--
John Mettraux - http://lambda.io/jmettraux

-- 
you received this message because you are subscribed to the "ruote users" group.
to post : send email to [email protected]
to unsubscribe : send email to [email protected]
more options : http://groups.google.com/group/openwferu-users?hl=en

Reply via email to