Re: [rhino-tools-dev] Re: Rhino Service Bus with Load Balancer pegs CPU around 100%

Corey Kaylor Wed, 16 Nov 2011 05:56:53 -0800

Ok, I'll take a look when I get into the office. I may suggest changes to
make and have you try them out. I have run into similar issues with rhino
queues being too eager in peeking messages in the past.


On Wed, Nov 16, 2011 at 1:00 AM, Michael Lyons <[email protected]> wrote:

> Sorry about that last message, for some reason it lost it's formatting
>
> On Nov 16, 6:44 pm, Michael Lyons <[email protected]> wrote:
> > Strangely enough I'm going to be testing load balancing next week
> > across physical servers as I have provisioned another server last week
> > for the staging environment to test this out.
> > In our case the workers get tied up as they are contacting website
> > services which sometimes can be really slow (up to 120 seconds)
> > causing the load balancers queue to grow. My idea with the load
> > balancer was so I can spin up a new worker process when the queue
> > becomes too large, which is what I can do currently and it works
> > perfectly, it's just that the load balancer is consuming more
> > resources than it needs to while the machine is really not under any
> > other stress.
> > I've just done some quick profile and all the action seems to be
> > called from AbstractMsmqListener.PeekMessageOnBackgroundThread. It
> > spends 53% of its time in calls to
> > MsmqLoadBalancer.HandlePeekedMessage and it's children with the
> > remaining 47% in AbstractMsmqListener.TryPeek and it's children.
> > So over a total period of 4 minutes RSB consumed 183 seconds out of
> > 240 seconds of CPU time excluding my app's time. Which I think is a
> > bit excessive particularly since it peeked at 226130 messages.
> > Shouldn't the load balancer pause for a second if it failed to get in
> > contact with any of the workers, instead of just blindly retrying?
> > Here are the top offenders in csv format - if you want I can email you
> > a full csv (it's actually tab delimited) or a pdf.
> > Total Time with children (ms), Average Time with children (ms), Total
> > for self (ms), Average for self (ms), Calls, Method name
> >
> +183366,0.8,11384,0.1,226122,Rhino.ServiceBus.LoadBalancer.MsmqLoadBalancer
> .HandlePeekedMessage(Rhino.ServiceBus.Msmq.OpenedQueue,System.Messaging.Mes
> sage)
> >
> +160651,0.7,4743,0,226130,Rhino.ServiceBus.Msmq.AbstractMsmqListener.TryPee
> k(Rhino.ServiceBus.Msmq.OpenedQueue,System.Messaging.Message&)
> >
> +155724,0.7,155724,0.7,226130,Rhino.ServiceBus.Msmq.OpenedQueue.Peek(System
> .TimeSpan)
> >
> +134270,0.6,134138,0.6,226125,Rhino.ServiceBus.Msmq.OpenedQueue.TryGetMessa
> geFromQueue(System.String)29787,0.2,1159,0,180430,Rhino.ServiceBus.LoadBala
> ncer.MsmqLoadBalancer.HandleStandardMessage(Rhino.ServiceBus.Msmq.OpenedQue
> ue,System.Messaging.Message)28569,0.2,28546,0.2,180430,Rhino.ServiceBus.Msm
> q.OpenedQueue.Send(System.Messaging.Message)7759,0,1312,0,180432,Rhino.Serv
> iceBus.LoadBalancer.MsmqLoadBalancer.PersistEndpoint(Rhino.ServiceBus.Msmq.
> OpenedQueue,System.Messaging.Message)3825,0,3825,0,180432,Rhino.ServiceBus.
> Msmq.MsmqUtil.GetQueueUri(System.Messaging.MessageQueue)2622,0,2622,0,18043
> 1,Rhino.ServiceBus.DataStructures.Set`1.Add(T)
> > On Nov 16, 4:39 pm, Corey Kaylor <[email protected]> wrote:
> >
> >
> >
> >
> >
> >
> >
> > > I am happy to take any form of contribution you can offer.
> >
> > > By adding additional worker endpoints I mean.
> >
> > > Load Balancer 1, 5 threads, deployed to MachineA
> > >   1 worker endpoint, configured to send  to
> Machine1\queue1.readyforwork, 5
> > > threads, deployed to NewMachineB
> > >   2 worker endpoint, configured to send to
> Machine1\queue1.readyforwork, 5
> > > threads, deployed to NewMachineC
> >
> > > Load balancing although completely *possible* to run on one machine,
> was
> > > designed to distribute load to multiple machines. You're not gaining
> any
> > > benefits from load balancing when there is only one worker sending
> ready
> > > for work messages to the load balancer. You would be better off in this
> > > case just having two endpoints without load balancing.
> >
> > > On Tue, Nov 15, 2011 at 10:28 PM, Michael Lyons <[email protected]
> >wrote:
> >
> > > > I've run EQATEC profiler against the code and when the load balancer
> > > > process is under load it it records no activity between snapshots
> > > > indicating it is sitting in RSB code.
> >
> > > > I'd be happy to spot profile RSB in my app and point out where the
> > > > high CPU is coming from but I'm assuming you already have a fair
> idea.
> >
> > > > What do you mean by adding additional worker endpoints? Can you point
> > > > me to an example.
> >
> > > > On Nov 16, 3:40 pm, Corey Kaylor <[email protected]> wrote:
> > > > > I would try changing the thread counts on the consumers and the
> load
> > > > > balancer, and possibly add additional worker endpoint(s).
> >
> > > > > Ayende in previous conversations has recommended thread counts
> that are
> > > > > equal to the number of cores on the machine. I have found that
> isn't
> > > > always
> > > > > a perfect recipe. So in our case we have run load tests and
> changing the
> > > > > configuration of threads for each machine.
> >
> > > > > When changing the thread counts on each test run, try to observe
> which
> > > > > specific process is utilizing the most CPU.
> >
> > > > > There may be places to optimize for sure, but it sounds to me like
> > > > threads
> > > > > are competing for priority.
> >
> > > > > On Tue, Nov 15, 2011 at 9:24 PM, Michael Lyons <
> [email protected]>
> > > > wrote:
> > > > > > Yes you're correct, it's a staging environment where we do our
> testing
> > > > > > before releasing into production.
> >
> > > > > > That's pretty much the situation.
> >
> > > > > > Here are the xml configurations for the 2 load balancers:
> >
> > > > > >    <loadBalancer threadCount="5"
> > > > > >              endpoint="msmq://localhost/notifier.loadbalancer"
> > > > > >              readyForWorkEndpoint="msmq://localhost/
> > > > > > notifier.loadbalancer.acceptingwork"
> > > > > >             />
> >
> > > > > >    <loadBalancer threadCount="5"
> > > > > >
>  endpoint="msmq://localhost/processor.loadbalancer"
> > > > > >                  readyForWorkEndpoint="msmq://localhost/
> > > > > > processor.loadbalancer.acceptingwork"
> > > > > >             />
> >
> > > > > > Consumers xml configuration is:
> >
> > > > > >    <bus threadCount="20"
> > > > > >         loadBalancerEndpoint="msmq://localhost/
> > > > > > processor.loadbalancer.acceptingwork"
> > > > > >         numberOfRetries="5"
> > > > > >         endpoint="msmq://localhost/processor"
> > > > > >             />
> >
> > > > > >    <bus threadCount="20"
> > > > > >         loadBalancerEndpoint="msmq://localhost/
> > > > > > notifier.loadbalancer.acceptingwork"
> > > > > >         numberOfRetries="5"
> > > > > >         endpoint="msmq://localhost/notifier"
> > > > > >             />
> >
> > > > > > On Nov 16, 3:13 pm, Corey Kaylor <[email protected]> wrote:
> > > > > > > To summarize your setup.
> >
> > > > > > > Load Balancer 1, configured for messages belonging to
> NamespaceA,
> > > > with 5
> > > > > > > threads, deployed to MachineA\queue1
> > > > > > >    1 worker endpoint sending sending ready for work to
> > > > > > > MachineA\queue1.readyforwork, configured with 20 threads,
> deployed to
> > > > > > > MachineA
> >
> > > > > > > Load Balancer 2, configured for messages belonging to
> NamespaceB,
> > > > with 5
> > > > > > > threads, deployed to MachineA\queue2
> > > > > > >    1 worker endpoint sending ready for work to
> > > > > > > MachineA\queue2.readyforwork, configured with 20 threads,
> deployed to
> > > > > > > MachineA
> >
> > > > > > > I assumed by staging server that you mean staging environment
> that is
> > > > > > > configured similarly above but with different machine specs as
> you've
> > > > > > > stated.
> >
> > > > > > > Is this correct?
> >
> > > > > > > On Tue, Nov 15, 2011 at 8:12 PM, Michael Lyons <
> [email protected]
> >
> > > > > > wrote:
> > > > > > > > The load balancers are configured with the
> readyForWorkEndpoint
> > > > > > > > attribute on the loadBalancer xml element.
> >
> > > > > > > > System is a quad core 2.83Ghz core 2 duo, on the staging
> server
> > > > which
> > > > > > > > is running an older single core 2.8Ghz xeon (Dell 2650) with
> hyper
> > > > > > > > threading it sits at about 80% and in production it sits
> between
> > > > 40 to
> > > > > > > > 80% on a quad core 2.8Ghz xeon (Dell R210) where it is
> allocated 2
> > > > > > > > cores
> >
> > > > > > > > Forgot to mention that RSB is version 2.2
> >
> > > > > > > > On Nov 16, 1:17 pm, Corey Kaylor <[email protected]> wrote:
> > > > > > > > > Also, how many cores are on the load balancer machine?
> There
> > > > > > shouldn't be
> > > > > > > > > that much demand on the cpu, but having said that it really
> > > > depends
> > > > > > on
> > > > > > > > the
> > > > > > > > > circumstances and environment.
> >
> > > > > > > > > On Tue, Nov 15, 2011 at 7:15 PM, Corey Kaylor <
> [email protected]
> >
> > > > > > wrote:
> > > > > > > > > > Is each load balancer configured with a ready for work
> uri?
> >
> > > > > > > > > > On Mon, Nov 14, 2011 at 5:06 PM, Michael Lyons <
> > > > > > [email protected]
> > > > > > > > >wrote:
> >
> > > > > > > > > >> When using the load balancer with RSB I'm seeing the
> CPU runs
> > > > at
> > > > > > near
> > > > > > > > > >> 100% when the consumers are all busy which causes the
> > > > consumers
> > > > > > to run
> > > > > > > > > >> slower and be free less often.
> > > > > > > > > >> It can be simulated easily by setting up a load
> balancer with
> > > > no
> > > > > > > > > >> consumers listening to it and trying to send out some
> > > > messages to
> > > > > > the
> > > > > > > > > >> consumer.
> >
> > > > > > > > > >> In my specific situation I have 2 load balancers with 5
> > > > threads
> > > > > > each
> > > > > > > > > >> (each load balancer runs a separate queue with different
> > > > types of
> > > > > > > > > >> messages), there is a consumer waiting at the other end
> of
> > > > each
> > > > > > load
> > > > > > > > > >> balancer with another 20 threads each. If one of the
> load
> > > > > > balancers
> > > > > > > > gets
> > > > > > > > > >> congested then all consumers run slow. When I ran the
> load
> > > > > > balancer
> > > > > > > > without
> > > > > > > > > >> load it averaged ~200ms to process a message, once the
> load
> > > > > > balancer
> > > > > > > > was
> > > > > > > > > >> under load (achieved by queuing over 1000 messages) it
> > > > resulted
> > > > > > in an
> > > > > > > > > >> average time of ~1750ms, which results in the user
> waiting 8
> > > > times
> > > > > > > > longer
> > > > > > > > > >> for their tasks to complete.
> >
> > > > > > > > > >> Is there anyway around this?
> >
> > > > > > > > > >> --
> > > > > > > > > >> You received this message because you are subscribed to
> the
> > > > Google
> > > > > > > > Groups
> >
> > ...
> >
> > read more »
>
> --
> You received this message because you are subscribed to the Google Groups
> "Rhino Tools Dev" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/rhino-tools-dev?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Rhino Tools Dev" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/rhino-tools-dev?hl=en.

Re: [rhino-tools-dev] Re: Rhino Service Bus with Load Balancer pegs CPU around 100%

Reply via email to