[rhino-tools-dev] Re: Rhino Service Bus with Load Balancer pegs CPU around 100%

Michael Lyons Wed, 16 Nov 2011 23:30:39 -0800

Sorry guys, didn't realise there was one until you told me. Maybe a
link on the wiki to it.
Request has been sent.
On Nov 17, 3:18 pm, Kaylor Mail <[email protected]> wrote:
> Can you send a pull request?
>
> On Nov 16, 2011, at 8:56 PM, Michael Lyons <[email protected]> wrote:
>
>
>
>
>
>
>
> > Great news Corey after a little bit of playing around I found what
> > seems to be a possible solution.
>
> > I reworked the code in MsmqLoadBalancer so that after a number of
> > failures to contact a worker it would then pause the thread for a
> > second and reset the failure count back to zero. By doing so the load
> > balancer dropped CPU usage to around 7%.
>
> > It worked perfectly in the situation when a worker was busy and
> > another worker process was started alleviating the queue backlog
> > without the load balancer trying to hog the system.
>
> > My code for the change to MsmqLoadBalancer.HandleStandardMessage can
> > be found here:http://pastebin.com/0PbC6ecB
>
> > On Nov 17, 12:56 am, Corey Kaylor <[email protected]> wrote:
> >> Ok, I'll take a look when I get into the office. I may suggest changes to
> >> make and have you try them out. I have run into similar issues with rhino
> >> queues being too eager in peeking messages in the past.
>
> >> On Wed, Nov 16, 2011 at 1:00 AM, Michael Lyons <[email protected]> 
> >> wrote:
> >>> Sorry about that last message, for some reason it lost it's formatting
>
> >>> On Nov 16, 6:44 pm, Michael Lyons <[email protected]> wrote:
> >>>> Strangely enough I'm going to be testing load balancing next week
> >>>> across physical servers as I have provisioned another server last week
> >>>> for the staging environment to test this out.
> >>>> In our case the workers get tied up as they are contacting website
> >>>> services which sometimes can be really slow (up to 120 seconds)
> >>>> causing the load balancers queue to grow. My idea with the load
> >>>> balancer was so I can spin up a new worker process when the queue
> >>>> becomes too large, which is what I can do currently and it works
> >>>> perfectly, it's just that the load balancer is consuming more
> >>>> resources than it needs to while the machine is really not under any
> >>>> other stress.
> >>>> I've just done some quick profile and all the action seems to be
> >>>> called from AbstractMsmqListener.PeekMessageOnBackgroundThread. It
> >>>> spends 53% of its time in calls to
> >>>> MsmqLoadBalancer.HandlePeekedMessage and it's children with the
> >>>> remaining 47% in AbstractMsmqListener.TryPeek and it's children.
> >>>> So over a total period of 4 minutes RSB consumed 183 seconds out of
> >>>> 240 seconds of CPU time excluding my app's time. Which I think is a
> >>>> bit excessive particularly since it peeked at 226130 messages.
> >>>> Shouldn't the load balancer pause for a second if it failed to get in
> >>>> contact with any of the workers, instead of just blindly retrying?
> >>>> Here are the top offenders in csv format - if you want I can email you
> >>>> a full csv (it's actually tab delimited) or a pdf.
> >>>> Total Time with children (ms), Average Time with children (ms), Total
> >>>> for self (ms), Average for self (ms), Calls, Method name
>
> >>> +183366,0.8,11384,0.1,226122,Rhino.ServiceBus.LoadBalancer.MsmqLoadBalancer
> >>> .HandlePeekedMessage(Rhino.ServiceBus.Msmq.OpenedQueue,System.Messaging.Mes
> >>> sage)
>
> >>> +160651,0.7,4743,0,226130,Rhino.ServiceBus.Msmq.AbstractMsmqListener.TryPee
> >>> k(Rhino.ServiceBus.Msmq.OpenedQueue,System.Messaging.Message&)
>
> >>> +155724,0.7,155724,0.7,226130,Rhino.ServiceBus.Msmq.OpenedQueue.Peek(System
> >>> .TimeSpan)
>
> >>> +134270,0.6,134138,0.6,226125,Rhino.ServiceBus.Msmq.OpenedQueue.TryGetMessa
> >>> geFromQueue(System.String)29787,0.2,1159,0,180430,Rhino.ServiceBus.LoadBala
> >>> ncer.MsmqLoadBalancer.HandleStandardMessage(Rhino.ServiceBus.Msmq.OpenedQue
> >>> ue,System.Messaging.Message)28569,0.2,28546,0.2,180430,Rhino.ServiceBus.Msm
> >>> q.OpenedQueue.Send(System.Messaging.Message)7759,0,1312,0,180432,Rhino.Serv
> >>> iceBus.LoadBalancer.MsmqLoadBalancer.PersistEndpoint(Rhino.ServiceBus.Msmq.
> >>> OpenedQueue,System.Messaging.Message)3825,0,3825,0,180432,Rhino.ServiceBus.
> >>> Msmq.MsmqUtil.GetQueueUri(System.Messaging.MessageQueue)2622,0,2622,0,18043
> >>> 1,Rhino.ServiceBus.DataStructures.Set`1.Add(T)
> >>>> On Nov 16, 4:39 pm, Corey Kaylor <[email protected]> wrote:
>
> >>>>> I am happy to take any form of contribution you can offer.
>
> >>>>> By adding additional worker endpoints I mean.
>
> >>>>> Load Balancer 1, 5 threads, deployed to MachineA
> >>>>>   1 worker endpoint, configured to send  to
> >>> Machine1\queue1.readyforwork, 5
> >>>>> threads, deployed to NewMachineB
> >>>>>   2 worker endpoint, configured to send to
> >>> Machine1\queue1.readyforwork, 5
> >>>>> threads, deployed to NewMachineC
>
> >>>>> Load balancing although completely *possible* to run on one machine,
> >>> was
> >>>>> designed to distribute load to multiple machines. You're not gaining
> >>> any
> >>>>> benefits from load balancing when there is only one worker sending
> >>> ready
> >>>>> for work messages to the load balancer. You would be better off in this
> >>>>> case just having two endpoints without load balancing.
>
> >>>>> On Tue, Nov 15, 2011 at 10:28 PM, Michael Lyons <[email protected]
> >>>> wrote:
>
> >>>>>> I've run EQATEC profiler against the code and when the load balancer
> >>>>>> process is under load it it records no activity between snapshots
> >>>>>> indicating it is sitting in RSB code.
>
> >>>>>> I'd be happy to spot profile RSB in my app and point out where the
> >>>>>> high CPU is coming from but I'm assuming you already have a fair
> >>> idea.
>
> >>>>>> What do you mean by adding additional worker endpoints? Can you point
> >>>>>> me to an example.
>
> >>>>>> On Nov 16, 3:40 pm, Corey Kaylor <[email protected]> wrote:
> >>>>>>> I would try changing the thread counts on the consumers and the
> >>> load
> >>>>>>> balancer, and possibly add additional worker endpoint(s).
>
> >>>>>>> Ayende in previous conversations has recommended thread counts
> >>> that are
> >>>>>>> equal to the number of cores on the machine. I have found that
> >>> isn't
> >>>>>> always
> >>>>>>> a perfect recipe. So in our case we have run load tests and
> >>> changing the
> >>>>>>> configuration of threads for each machine.
>
> >>>>>>> When changing the thread counts on each test run, try to observe
> >>> which
> >>>>>>> specific process is utilizing the most CPU.
>
> >>>>>>> There may be places to optimize for sure, but it sounds to me like
> >>>>>> threads
> >>>>>>> are competing for priority.
>
> >>>>>>> On Tue, Nov 15, 2011 at 9:24 PM, Michael Lyons <
> >>> [email protected]>
> >>>>>> wrote:
> >>>>>>>> Yes you're correct, it's a staging environment where we do our
> >>> testing
> >>>>>>>> before releasing into production.
>
> >>>>>>>> That's pretty much the situation.
>
> >>>>>>>> Here are the xml configurations for the 2 load balancers:
>
> >>>>>>>>    <loadBalancer threadCount="5"
> >>>>>>>>              endpoint="msmq://localhost/notifier.loadbalancer"
> >>>>>>>>              readyForWorkEndpoint="msmq://localhost/
> >>>>>>>> notifier.loadbalancer.acceptingwork"
> >>>>>>>>             />
>
> >>>>>>>>    <loadBalancer threadCount="5"
>
> >>>  endpoint="msmq://localhost/processor.loadbalancer"
> >>>>>>>>                  readyForWorkEndpoint="msmq://localhost/
> >>>>>>>> processor.loadbalancer.acceptingwork"
> >>>>>>>>             />
>
> >>>>>>>> Consumers xml configuration is:
>
> >>>>>>>>    <bus threadCount="20"
> >>>>>>>>         loadBalancerEndpoint="msmq://localhost/
> >>>>>>>> processor.loadbalancer.acceptingwork"
> >>>>>>>>         numberOfRetries="5"
> >>>>>>>>         endpoint="msmq://localhost/processor"
> >>>>>>>>             />
>
> >>>>>>>>    <bus threadCount="20"
> >>>>>>>>         loadBalancerEndpoint="msmq://localhost/
> >>>>>>>> notifier.loadbalancer.acceptingwork"
> >>>>>>>>         numberOfRetries="5"
> >>>>>>>>         endpoint="msmq://localhost/notifier"
> >>>>>>>>             />
>
> >>>>>>>> On Nov 16, 3:13 pm, Corey Kaylor <[email protected]> wrote:
> >>>>>>>>> To summarize your setup.
>
> >>>>>>>>> Load Balancer 1, configured for messages belonging to
> >>> NamespaceA,
> >>>>>> with 5
> >>>>>>>>> threads, deployed to MachineA\queue1
> >>>>>>>>>    1 worker endpoint sending sending ready for work to
> >>>>>>>>> MachineA\queue1.readyforwork, configured with 20 threads,
> >>> deployed to
> >>>>>>>>> MachineA
>
> >>>>>>>>> Load Balancer 2, configured for messages belonging to
> >>> NamespaceB,
> >>>>>> with 5
> >>>>>>>>> threads, deployed to MachineA\queue2
> >>>>>>>>>    1 worker endpoint sending ready for work to
> >>>>>>>>> MachineA\queue2.readyforwork, configured with 20 threads,
> >>> deployed to
> >>>>>>>>> MachineA
>
> >>>>>>>>> I assumed by staging server that you mean staging environment
> >>> that is
> >>>>>>>>> configured similarly above but with different machine specs as
> >>> you've
> >>>>>>>>> stated.
>
> >>>>>>>>> Is this correct?
>
> >>>>>>>>> On Tue, Nov 15, 2011 at 8:12 PM, Michael Lyons <
> >>> [email protected]
>
> >>>>>>>> wrote:
> >>>>>>>>>> The load balancers are configured with the
> >>> readyForWorkEndpoint
> >>>>>>>>>> attribute on the loadBalancer xml element.
>
> >>>>>>>>>> System is a quad core 2.83Ghz core 2 duo, on the staging
> >>> server
> >>>>>> which
> >>>>>>>>>> is running an older single core 2.8Ghz xeon (Dell 2650) with
> >>> hyper
> >>>>>>>>>> threading it sits at about 80% and in production it sits
> >>> between
> >>>>>> 40 to
> >>>>>>>>>> 80% on a quad core 2.8Ghz xeon (Dell R210) where it is
> >>> allocated 2
> >>>>>>>>>> cores
>
> >>>>>>>>>> Forgot to mention that RSB is version 2.2
>
> >>>>>>>>>> On Nov 16, 1:17 pm, Corey Kaylor <[email protected]> wrote:
> >>>>>>>>>>> Also, how many cores are on the load balancer machine?
> >>> There
> >>>>>>>> shouldn't be
> >>>>>>>>>>> that much demand on the cpu, but having said that it really
> >>>>>> depends
> >>>>>>>> on
> >>>>>>>>>> the
> >>>>>>>>>>> circumstances and environment.
>
> >>>>>>>>>>> On Tue, Nov 15, 2011 at 7:15 PM, Corey Kaylor <
> >>> [email protected]
>
> >>>>>>>> wrote:
> >>>>>>>>>>>> Is each load balancer configured with a ready for work
> >>> uri?
>
> >>>>>>>>>>>> On Mon, Nov 14, 2011 at 5:06 PM, Michael Lyons <
> >>>>>>>> [email protected]
> >>>>>>>>>>> wrote:
>
> ...
>
> read more »


-- 
You received this message because you are subscribed to the Google Groups 
"Rhino Tools Dev" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/rhino-tools-dev?hl=en.

[rhino-tools-dev] Re: Rhino Service Bus with Load Balancer pegs CPU around 100%

Reply via email to