[rhino-tools-dev] Re: Rhino Service Bus with Load Balancer pegs CPU around 100%

Michael Lyons Wed, 16 Nov 2011 00:01:13 -0800

Sorry about that last message, for some reason it lost it's formatting

On Nov 16, 6:44 pm, Michael Lyons <[email protected]> wrote:
> Strangely enough I'm going to be testing load balancing next week
> across physical servers as I have provisioned another server last week
> for the staging environment to test this out.
> In our case the workers get tied up as they are contacting website
> services which sometimes can be really slow (up to 120 seconds)
> causing the load balancers queue to grow. My idea with the load
> balancer was so I can spin up a new worker process when the queue
> becomes too large, which is what I can do currently and it works
> perfectly, it's just that the load balancer is consuming more
> resources than it needs to while the machine is really not under any
> other stress.
> I've just done some quick profile and all the action seems to be
> called from AbstractMsmqListener.PeekMessageOnBackgroundThread. It
> spends 53% of its time in calls to
> MsmqLoadBalancer.HandlePeekedMessage and it's children with the
> remaining 47% in AbstractMsmqListener.TryPeek and it's children.
> So over a total period of 4 minutes RSB consumed 183 seconds out of
> 240 seconds of CPU time excluding my app's time. Which I think is a
> bit excessive particularly since it peeked at 226130 messages.
> Shouldn't the load balancer pause for a second if it failed to get in
> contact with any of the workers, instead of just blindly retrying?
> Here are the top offenders in csv format - if you want I can email you
> a full csv (it's actually tab delimited) or a pdf.
> Total Time with children (ms), Average Time with children (ms), Total
> for self (ms), Average for self (ms), Calls, Method name
> +183366,0.8,11384,0.1,226122,Rhino.ServiceBus.LoadBalancer.MsmqLoadBalancer 
> .HandlePeekedMessage(Rhino.ServiceBus.Msmq.OpenedQueue,System.Messaging.Mes 
> sage)
> +160651,0.7,4743,0,226130,Rhino.ServiceBus.Msmq.AbstractMsmqListener.TryPee 
> k(Rhino.ServiceBus.Msmq.OpenedQueue,System.Messaging.Message&)
> +155724,0.7,155724,0.7,226130,Rhino.ServiceBus.Msmq.OpenedQueue.Peek(System 
> .TimeSpan)
> +134270,0.6,134138,0.6,226125,Rhino.ServiceBus.Msmq.OpenedQueue.TryGetMessa 
> geFromQueue(System.String)29787,0.2,1159,0,180430,Rhino.ServiceBus.LoadBala 
> ncer.MsmqLoadBalancer.HandleStandardMessage(Rhino.ServiceBus.Msmq.OpenedQue 
> ue,System.Messaging.Message)28569,0.2,28546,0.2,180430,Rhino.ServiceBus.Msm 
> q.OpenedQueue.Send(System.Messaging.Message)7759,0,1312,0,180432,Rhino.Serv 
> iceBus.LoadBalancer.MsmqLoadBalancer.PersistEndpoint(Rhino.ServiceBus.Msmq. 
> OpenedQueue,System.Messaging.Message)3825,0,3825,0,180432,Rhino.ServiceBus. 
> Msmq.MsmqUtil.GetQueueUri(System.Messaging.MessageQueue)2622,0,2622,0,18043 
> 1,Rhino.ServiceBus.DataStructures.Set`1.Add(T)
> On Nov 16, 4:39 pm, Corey Kaylor <[email protected]> wrote:
>
>
>
>
>
>
>
> > I am happy to take any form of contribution you can offer.
>
> > By adding additional worker endpoints I mean.
>
> > Load Balancer 1, 5 threads, deployed to MachineA
> >   1 worker endpoint, configured to send  to Machine1\queue1.readyforwork, 5
> > threads, deployed to NewMachineB
> >   2 worker endpoint, configured to send to Machine1\queue1.readyforwork, 5
> > threads, deployed to NewMachineC
>
> > Load balancing although completely *possible* to run on one machine, was
> > designed to distribute load to multiple machines. You're not gaining any
> > benefits from load balancing when there is only one worker sending ready
> > for work messages to the load balancer. You would be better off in this
> > case just having two endpoints without load balancing.
>
> > On Tue, Nov 15, 2011 at 10:28 PM, Michael Lyons <[email protected]>wrote:
>
> > > I've run EQATEC profiler against the code and when the load balancer
> > > process is under load it it records no activity between snapshots
> > > indicating it is sitting in RSB code.
>
> > > I'd be happy to spot profile RSB in my app and point out where the
> > > high CPU is coming from but I'm assuming you already have a fair idea.
>
> > > What do you mean by adding additional worker endpoints? Can you point
> > > me to an example.
>
> > > On Nov 16, 3:40 pm, Corey Kaylor <[email protected]> wrote:
> > > > I would try changing the thread counts on the consumers and the load
> > > > balancer, and possibly add additional worker endpoint(s).
>
> > > > Ayende in previous conversations has recommended thread counts that are
> > > > equal to the number of cores on the machine. I have found that isn't
> > > always
> > > > a perfect recipe. So in our case we have run load tests and changing the
> > > > configuration of threads for each machine.
>
> > > > When changing the thread counts on each test run, try to observe which
> > > > specific process is utilizing the most CPU.
>
> > > > There may be places to optimize for sure, but it sounds to me like
> > > threads
> > > > are competing for priority.
>
> > > > On Tue, Nov 15, 2011 at 9:24 PM, Michael Lyons <[email protected]>
> > > wrote:
> > > > > Yes you're correct, it's a staging environment where we do our testing
> > > > > before releasing into production.
>
> > > > > That's pretty much the situation.
>
> > > > > Here are the xml configurations for the 2 load balancers:
>
> > > > >    <loadBalancer threadCount="5"
> > > > >              endpoint="msmq://localhost/notifier.loadbalancer"
> > > > >              readyForWorkEndpoint="msmq://localhost/
> > > > > notifier.loadbalancer.acceptingwork"
> > > > >             />
>
> > > > >    <loadBalancer threadCount="5"
> > > > >                  endpoint="msmq://localhost/processor.loadbalancer"
> > > > >                  readyForWorkEndpoint="msmq://localhost/
> > > > > processor.loadbalancer.acceptingwork"
> > > > >             />
>
> > > > > Consumers xml configuration is:
>
> > > > >    <bus threadCount="20"
> > > > >         loadBalancerEndpoint="msmq://localhost/
> > > > > processor.loadbalancer.acceptingwork"
> > > > >         numberOfRetries="5"
> > > > >         endpoint="msmq://localhost/processor"
> > > > >             />
>
> > > > >    <bus threadCount="20"
> > > > >         loadBalancerEndpoint="msmq://localhost/
> > > > > notifier.loadbalancer.acceptingwork"
> > > > >         numberOfRetries="5"
> > > > >         endpoint="msmq://localhost/notifier"
> > > > >             />
>
> > > > > On Nov 16, 3:13 pm, Corey Kaylor <[email protected]> wrote:
> > > > > > To summarize your setup.
>
> > > > > > Load Balancer 1, configured for messages belonging to NamespaceA,
> > > with 5
> > > > > > threads, deployed to MachineA\queue1
> > > > > >    1 worker endpoint sending sending ready for work to
> > > > > > MachineA\queue1.readyforwork, configured with 20 threads, deployed 
> > > > > > to
> > > > > > MachineA
>
> > > > > > Load Balancer 2, configured for messages belonging to NamespaceB,
> > > with 5
> > > > > > threads, deployed to MachineA\queue2
> > > > > >    1 worker endpoint sending ready for work to
> > > > > > MachineA\queue2.readyforwork, configured with 20 threads, deployed 
> > > > > > to
> > > > > > MachineA
>
> > > > > > I assumed by staging server that you mean staging environment that 
> > > > > > is
> > > > > > configured similarly above but with different machine specs as 
> > > > > > you've
> > > > > > stated.
>
> > > > > > Is this correct?
>
> > > > > > On Tue, Nov 15, 2011 at 8:12 PM, Michael Lyons <[email protected]
>
> > > > > wrote:
> > > > > > > The load balancers are configured with the readyForWorkEndpoint
> > > > > > > attribute on the loadBalancer xml element.
>
> > > > > > > System is a quad core 2.83Ghz core 2 duo, on the staging server
> > > which
> > > > > > > is running an older single core 2.8Ghz xeon (Dell 2650) with hyper
> > > > > > > threading it sits at about 80% and in production it sits between
> > > 40 to
> > > > > > > 80% on a quad core 2.8Ghz xeon (Dell R210) where it is allocated 2
> > > > > > > cores
>
> > > > > > > Forgot to mention that RSB is version 2.2
>
> > > > > > > On Nov 16, 1:17 pm, Corey Kaylor <[email protected]> wrote:
> > > > > > > > Also, how many cores are on the load balancer machine? There
> > > > > shouldn't be
> > > > > > > > that much demand on the cpu, but having said that it really
> > > depends
> > > > > on
> > > > > > > the
> > > > > > > > circumstances and environment.
>
> > > > > > > > On Tue, Nov 15, 2011 at 7:15 PM, Corey Kaylor <[email protected]
>
> > > > > wrote:
> > > > > > > > > Is each load balancer configured with a ready for work uri?
>
> > > > > > > > > On Mon, Nov 14, 2011 at 5:06 PM, Michael Lyons <
> > > > > [email protected]
> > > > > > > >wrote:
>
> > > > > > > > >> When using the load balancer with RSB I'm seeing the CPU runs
> > > at
> > > > > near
> > > > > > > > >> 100% when the consumers are all busy which causes the
> > > consumers
> > > > > to run
> > > > > > > > >> slower and be free less often.
> > > > > > > > >> It can be simulated easily by setting up a load balancer with
> > > no
> > > > > > > > >> consumers listening to it and trying to send out some
> > > messages to
> > > > > the
> > > > > > > > >> consumer.
>
> > > > > > > > >> In my specific situation I have 2 load balancers with 5
> > > threads
> > > > > each
> > > > > > > > >> (each load balancer runs a separate queue with different
> > > types of
> > > > > > > > >> messages), there is a consumer waiting at the other end of
> > > each
> > > > > load
> > > > > > > > >> balancer with another 20 threads each. If one of the load
> > > > > balancers
> > > > > > > gets
> > > > > > > > >> congested then all consumers run slow. When I ran the load
> > > > > balancer
> > > > > > > without
> > > > > > > > >> load it averaged ~200ms to process a message, once the load
> > > > > balancer
> > > > > > > was
> > > > > > > > >> under load (achieved by queuing over 1000 messages) it
> > > resulted
> > > > > in an
> > > > > > > > >> average time of ~1750ms, which results in the user waiting 8
> > > times
> > > > > > > longer
> > > > > > > > >> for their tasks to complete.
>
> > > > > > > > >> Is there anyway around this?
>
> > > > > > > > >> --
> > > > > > > > >> You received this message because you are subscribed to the
> > > Google
> > > > > > > Groups
>
> ...
>
> read more »


-- 
You received this message because you are subscribed to the Google Groups 
"Rhino Tools Dev" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/rhino-tools-dev?hl=en.

[rhino-tools-dev] Re: Rhino Service Bus with Load Balancer pegs CPU around 100%

Reply via email to