[rhino-tools-dev] Re: Rhino Service Bus with Load Balancer pegs CPU around 100%

Michael Lyons Wed, 16 Nov 2011 19:57:08 -0800

Great news Corey after a little bit of playing around I found what
seems to be a possible solution.


I reworked the code in MsmqLoadBalancer so that after a number of
failures to contact a worker it would then pause the thread for a
second and reset the failure count back to zero. By doing so the load
balancer dropped CPU usage to around 7%.

It worked perfectly in the situation when a worker was busy and
another worker process was started alleviating the queue backlog
without the load balancer trying to hog the system.

My code for the change to MsmqLoadBalancer.HandleStandardMessage can
be found here: http://pastebin.com/0PbC6ecB


On Nov 17, 12:56 am, Corey Kaylor <[email protected]> wrote:
> Ok, I'll take a look when I get into the office. I may suggest changes to
> make and have you try them out. I have run into similar issues with rhino
> queues being too eager in peeking messages in the past.
>
>
>
>
>
>
>
> On Wed, Nov 16, 2011 at 1:00 AM, Michael Lyons <[email protected]> wrote:
> > Sorry about that last message, for some reason it lost it's formatting
>
> > On Nov 16, 6:44 pm, Michael Lyons <[email protected]> wrote:
> > > Strangely enough I'm going to be testing load balancing next week
> > > across physical servers as I have provisioned another server last week
> > > for the staging environment to test this out.
> > > In our case the workers get tied up as they are contacting website
> > > services which sometimes can be really slow (up to 120 seconds)
> > > causing the load balancers queue to grow. My idea with the load
> > > balancer was so I can spin up a new worker process when the queue
> > > becomes too large, which is what I can do currently and it works
> > > perfectly, it's just that the load balancer is consuming more
> > > resources than it needs to while the machine is really not under any
> > > other stress.
> > > I've just done some quick profile and all the action seems to be
> > > called from AbstractMsmqListener.PeekMessageOnBackgroundThread. It
> > > spends 53% of its time in calls to
> > > MsmqLoadBalancer.HandlePeekedMessage and it's children with the
> > > remaining 47% in AbstractMsmqListener.TryPeek and it's children.
> > > So over a total period of 4 minutes RSB consumed 183 seconds out of
> > > 240 seconds of CPU time excluding my app's time. Which I think is a
> > > bit excessive particularly since it peeked at 226130 messages.
> > > Shouldn't the load balancer pause for a second if it failed to get in
> > > contact with any of the workers, instead of just blindly retrying?
> > > Here are the top offenders in csv format - if you want I can email you
> > > a full csv (it's actually tab delimited) or a pdf.
> > > Total Time with children (ms), Average Time with children (ms), Total
> > > for self (ms), Average for self (ms), Calls, Method name
>
> > +183366,0.8,11384,0.1,226122,Rhino.ServiceBus.LoadBalancer.MsmqLoadBalancer
> > .HandlePeekedMessage(Rhino.ServiceBus.Msmq.OpenedQueue,System.Messaging.Mes
> > sage)
>
> > +160651,0.7,4743,0,226130,Rhino.ServiceBus.Msmq.AbstractMsmqListener.TryPee
> > k(Rhino.ServiceBus.Msmq.OpenedQueue,System.Messaging.Message&)
>
> > +155724,0.7,155724,0.7,226130,Rhino.ServiceBus.Msmq.OpenedQueue.Peek(System
> > .TimeSpan)
>
> > +134270,0.6,134138,0.6,226125,Rhino.ServiceBus.Msmq.OpenedQueue.TryGetMessa
> > geFromQueue(System.String)29787,0.2,1159,0,180430,Rhino.ServiceBus.LoadBala
> > ncer.MsmqLoadBalancer.HandleStandardMessage(Rhino.ServiceBus.Msmq.OpenedQue
> > ue,System.Messaging.Message)28569,0.2,28546,0.2,180430,Rhino.ServiceBus.Msm
> > q.OpenedQueue.Send(System.Messaging.Message)7759,0,1312,0,180432,Rhino.Serv
> > iceBus.LoadBalancer.MsmqLoadBalancer.PersistEndpoint(Rhino.ServiceBus.Msmq.
> > OpenedQueue,System.Messaging.Message)3825,0,3825,0,180432,Rhino.ServiceBus.
> > Msmq.MsmqUtil.GetQueueUri(System.Messaging.MessageQueue)2622,0,2622,0,18043
> > 1,Rhino.ServiceBus.DataStructures.Set`1.Add(T)
> > > On Nov 16, 4:39 pm, Corey Kaylor <[email protected]> wrote:
>
> > > > I am happy to take any form of contribution you can offer.
>
> > > > By adding additional worker endpoints I mean.
>
> > > > Load Balancer 1, 5 threads, deployed to MachineA
> > > >   1 worker endpoint, configured to send  to
> > Machine1\queue1.readyforwork, 5
> > > > threads, deployed to NewMachineB
> > > >   2 worker endpoint, configured to send to
> > Machine1\queue1.readyforwork, 5
> > > > threads, deployed to NewMachineC
>
> > > > Load balancing although completely *possible* to run on one machine,
> > was
> > > > designed to distribute load to multiple machines. You're not gaining
> > any
> > > > benefits from load balancing when there is only one worker sending
> > ready
> > > > for work messages to the load balancer. You would be better off in this
> > > > case just having two endpoints without load balancing.
>
> > > > On Tue, Nov 15, 2011 at 10:28 PM, Michael Lyons <[email protected]
> > >wrote:
>
> > > > > I've run EQATEC profiler against the code and when the load balancer
> > > > > process is under load it it records no activity between snapshots
> > > > > indicating it is sitting in RSB code.
>
> > > > > I'd be happy to spot profile RSB in my app and point out where the
> > > > > high CPU is coming from but I'm assuming you already have a fair
> > idea.
>
> > > > > What do you mean by adding additional worker endpoints? Can you point
> > > > > me to an example.
>
> > > > > On Nov 16, 3:40 pm, Corey Kaylor <[email protected]> wrote:
> > > > > > I would try changing the thread counts on the consumers and the
> > load
> > > > > > balancer, and possibly add additional worker endpoint(s).
>
> > > > > > Ayende in previous conversations has recommended thread counts
> > that are
> > > > > > equal to the number of cores on the machine. I have found that
> > isn't
> > > > > always
> > > > > > a perfect recipe. So in our case we have run load tests and
> > changing the
> > > > > > configuration of threads for each machine.
>
> > > > > > When changing the thread counts on each test run, try to observe
> > which
> > > > > > specific process is utilizing the most CPU.
>
> > > > > > There may be places to optimize for sure, but it sounds to me like
> > > > > threads
> > > > > > are competing for priority.
>
> > > > > > On Tue, Nov 15, 2011 at 9:24 PM, Michael Lyons <
> > [email protected]>
> > > > > wrote:
> > > > > > > Yes you're correct, it's a staging environment where we do our
> > testing
> > > > > > > before releasing into production.
>
> > > > > > > That's pretty much the situation.
>
> > > > > > > Here are the xml configurations for the 2 load balancers:
>
> > > > > > >    <loadBalancer threadCount="5"
> > > > > > >              endpoint="msmq://localhost/notifier.loadbalancer"
> > > > > > >              readyForWorkEndpoint="msmq://localhost/
> > > > > > > notifier.loadbalancer.acceptingwork"
> > > > > > >             />
>
> > > > > > >    <loadBalancer threadCount="5"
>
> >  endpoint="msmq://localhost/processor.loadbalancer"
> > > > > > >                  readyForWorkEndpoint="msmq://localhost/
> > > > > > > processor.loadbalancer.acceptingwork"
> > > > > > >             />
>
> > > > > > > Consumers xml configuration is:
>
> > > > > > >    <bus threadCount="20"
> > > > > > >         loadBalancerEndpoint="msmq://localhost/
> > > > > > > processor.loadbalancer.acceptingwork"
> > > > > > >         numberOfRetries="5"
> > > > > > >         endpoint="msmq://localhost/processor"
> > > > > > >             />
>
> > > > > > >    <bus threadCount="20"
> > > > > > >         loadBalancerEndpoint="msmq://localhost/
> > > > > > > notifier.loadbalancer.acceptingwork"
> > > > > > >         numberOfRetries="5"
> > > > > > >         endpoint="msmq://localhost/notifier"
> > > > > > >             />
>
> > > > > > > On Nov 16, 3:13 pm, Corey Kaylor <[email protected]> wrote:
> > > > > > > > To summarize your setup.
>
> > > > > > > > Load Balancer 1, configured for messages belonging to
> > NamespaceA,
> > > > > with 5
> > > > > > > > threads, deployed to MachineA\queue1
> > > > > > > >    1 worker endpoint sending sending ready for work to
> > > > > > > > MachineA\queue1.readyforwork, configured with 20 threads,
> > deployed to
> > > > > > > > MachineA
>
> > > > > > > > Load Balancer 2, configured for messages belonging to
> > NamespaceB,
> > > > > with 5
> > > > > > > > threads, deployed to MachineA\queue2
> > > > > > > >    1 worker endpoint sending ready for work to
> > > > > > > > MachineA\queue2.readyforwork, configured with 20 threads,
> > deployed to
> > > > > > > > MachineA
>
> > > > > > > > I assumed by staging server that you mean staging environment
> > that is
> > > > > > > > configured similarly above but with different machine specs as
> > you've
> > > > > > > > stated.
>
> > > > > > > > Is this correct?
>
> > > > > > > > On Tue, Nov 15, 2011 at 8:12 PM, Michael Lyons <
> > [email protected]
>
> > > > > > > wrote:
> > > > > > > > > The load balancers are configured with the
> > readyForWorkEndpoint
> > > > > > > > > attribute on the loadBalancer xml element.
>
> > > > > > > > > System is a quad core 2.83Ghz core 2 duo, on the staging
> > server
> > > > > which
> > > > > > > > > is running an older single core 2.8Ghz xeon (Dell 2650) with
> > hyper
> > > > > > > > > threading it sits at about 80% and in production it sits
> > between
> > > > > 40 to
> > > > > > > > > 80% on a quad core 2.8Ghz xeon (Dell R210) where it is
> > allocated 2
> > > > > > > > > cores
>
> > > > > > > > > Forgot to mention that RSB is version 2.2
>
> > > > > > > > > On Nov 16, 1:17 pm, Corey Kaylor <[email protected]> wrote:
> > > > > > > > > > Also, how many cores are on the load balancer machine?
> > There
> > > > > > > shouldn't be
> > > > > > > > > > that much demand on the cpu, but having said that it really
> > > > > depends
> > > > > > > on
> > > > > > > > > the
> > > > > > > > > > circumstances and environment.
>
> > > > > > > > > > On Tue, Nov 15, 2011 at 7:15 PM, Corey Kaylor <
> > [email protected]
>
> > > > > > > wrote:
> > > > > > > > > > > Is each load balancer configured with a ready for work
> > uri?
>
> > > > > > > > > > > On Mon, Nov 14, 2011 at 5:06 PM, Michael Lyons <
> > > > > > > [email protected]
> > > > > > > > > >wrote:
>
> > > > > > > > > > >> When using the load balancer with RSB I'm seeing the
> > CPU runs
> > > > > at
> > > > > > > near
> > > > > > > > > > >> 100% when the consumers are all busy which causes the
> > > > > consumers
> > > > > > > to run
> > > > > > > > > > >> slower and be free less often.
> > > > > > > > > > >> It can be simulated easily by setting up a load
> > balancer with
> > > > > no
> > > > > > > > > > >> consumers listening to it and trying to send out some
> > > > > messages to
> > > > > > > the
> > > > > > > > > > >> consumer.
>
> > > > > > > > > > >> In my specific situation I have 2 load balancers with 5
> > > > > threads
> > > > > > > each
> > > > > > > > > > >> (each load balancer runs a separate queue
>
> ...
>
> read more »

-- 
You received this message because you are subscribed to the Google Groups 
"Rhino Tools Dev" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/rhino-tools-dev?hl=en.

[rhino-tools-dev] Re: Rhino Service Bus with Load Balancer pegs CPU around 100%

Reply via email to