Re: [rhino-tools-dev] Re: Rhino Service Bus with Load Balancer pegs CPU around 100%

Kaylor Mail Wed, 16 Nov 2011 20:19:38 -0800

Can you send a pull request?

On Nov 16, 2011, at 8:56 PM, Michael Lyons <[email protected]> wrote:


> Great news Corey after a little bit of playing around I found what
> seems to be a possible solution.
> 
> I reworked the code in MsmqLoadBalancer so that after a number of
> failures to contact a worker it would then pause the thread for a
> second and reset the failure count back to zero. By doing so the load
> balancer dropped CPU usage to around 7%.
> 
> It worked perfectly in the situation when a worker was busy and
> another worker process was started alleviating the queue backlog
> without the load balancer trying to hog the system.
> 
> My code for the change to MsmqLoadBalancer.HandleStandardMessage can
> be found here: http://pastebin.com/0PbC6ecB
> 
> 
> On Nov 17, 12:56 am, Corey Kaylor <[email protected]> wrote:
>> Ok, I'll take a look when I get into the office. I may suggest changes to
>> make and have you try them out. I have run into similar issues with rhino
>> queues being too eager in peeking messages in the past.
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Wed, Nov 16, 2011 at 1:00 AM, Michael Lyons <[email protected]> wrote:
>>> Sorry about that last message, for some reason it lost it's formatting
>> 
>>> On Nov 16, 6:44 pm, Michael Lyons <[email protected]> wrote:
>>>> Strangely enough I'm going to be testing load balancing next week
>>>> across physical servers as I have provisioned another server last week
>>>> for the staging environment to test this out.
>>>> In our case the workers get tied up as they are contacting website
>>>> services which sometimes can be really slow (up to 120 seconds)
>>>> causing the load balancers queue to grow. My idea with the load
>>>> balancer was so I can spin up a new worker process when the queue
>>>> becomes too large, which is what I can do currently and it works
>>>> perfectly, it's just that the load balancer is consuming more
>>>> resources than it needs to while the machine is really not under any
>>>> other stress.
>>>> I've just done some quick profile and all the action seems to be
>>>> called from AbstractMsmqListener.PeekMessageOnBackgroundThread. It
>>>> spends 53% of its time in calls to
>>>> MsmqLoadBalancer.HandlePeekedMessage and it's children with the
>>>> remaining 47% in AbstractMsmqListener.TryPeek and it's children.
>>>> So over a total period of 4 minutes RSB consumed 183 seconds out of
>>>> 240 seconds of CPU time excluding my app's time. Which I think is a
>>>> bit excessive particularly since it peeked at 226130 messages.
>>>> Shouldn't the load balancer pause for a second if it failed to get in
>>>> contact with any of the workers, instead of just blindly retrying?
>>>> Here are the top offenders in csv format - if you want I can email you
>>>> a full csv (it's actually tab delimited) or a pdf.
>>>> Total Time with children (ms), Average Time with children (ms), Total
>>>> for self (ms), Average for self (ms), Calls, Method name
>> 
>>> +183366,0.8,11384,0.1,226122,Rhino.ServiceBus.LoadBalancer.MsmqLoadBalancer
>>> .HandlePeekedMessage(Rhino.ServiceBus.Msmq.OpenedQueue,System.Messaging.Mes
>>> sage)
>> 
>>> +160651,0.7,4743,0,226130,Rhino.ServiceBus.Msmq.AbstractMsmqListener.TryPee
>>> k(Rhino.ServiceBus.Msmq.OpenedQueue,System.Messaging.Message&)
>> 
>>> +155724,0.7,155724,0.7,226130,Rhino.ServiceBus.Msmq.OpenedQueue.Peek(System
>>> .TimeSpan)
>> 
>>> +134270,0.6,134138,0.6,226125,Rhino.ServiceBus.Msmq.OpenedQueue.TryGetMessa
>>> geFromQueue(System.String)29787,0.2,1159,0,180430,Rhino.ServiceBus.LoadBala
>>> ncer.MsmqLoadBalancer.HandleStandardMessage(Rhino.ServiceBus.Msmq.OpenedQue
>>> ue,System.Messaging.Message)28569,0.2,28546,0.2,180430,Rhino.ServiceBus.Msm
>>> q.OpenedQueue.Send(System.Messaging.Message)7759,0,1312,0,180432,Rhino.Serv
>>> iceBus.LoadBalancer.MsmqLoadBalancer.PersistEndpoint(Rhino.ServiceBus.Msmq.
>>> OpenedQueue,System.Messaging.Message)3825,0,3825,0,180432,Rhino.ServiceBus.
>>> Msmq.MsmqUtil.GetQueueUri(System.Messaging.MessageQueue)2622,0,2622,0,18043
>>> 1,Rhino.ServiceBus.DataStructures.Set`1.Add(T)
>>>> On Nov 16, 4:39 pm, Corey Kaylor <[email protected]> wrote:
>> 
>>>>> I am happy to take any form of contribution you can offer.
>> 
>>>>> By adding additional worker endpoints I mean.
>> 
>>>>> Load Balancer 1, 5 threads, deployed to MachineA
>>>>>   1 worker endpoint, configured to send  to
>>> Machine1\queue1.readyforwork, 5
>>>>> threads, deployed to NewMachineB
>>>>>   2 worker endpoint, configured to send to
>>> Machine1\queue1.readyforwork, 5
>>>>> threads, deployed to NewMachineC
>> 
>>>>> Load balancing although completely *possible* to run on one machine,
>>> was
>>>>> designed to distribute load to multiple machines. You're not gaining
>>> any
>>>>> benefits from load balancing when there is only one worker sending
>>> ready
>>>>> for work messages to the load balancer. You would be better off in this
>>>>> case just having two endpoints without load balancing.
>> 
>>>>> On Tue, Nov 15, 2011 at 10:28 PM, Michael Lyons <[email protected]
>>>> wrote:
>> 
>>>>>> I've run EQATEC profiler against the code and when the load balancer
>>>>>> process is under load it it records no activity between snapshots
>>>>>> indicating it is sitting in RSB code.
>> 
>>>>>> I'd be happy to spot profile RSB in my app and point out where the
>>>>>> high CPU is coming from but I'm assuming you already have a fair
>>> idea.
>> 
>>>>>> What do you mean by adding additional worker endpoints? Can you point
>>>>>> me to an example.
>> 
>>>>>> On Nov 16, 3:40 pm, Corey Kaylor <[email protected]> wrote:
>>>>>>> I would try changing the thread counts on the consumers and the
>>> load
>>>>>>> balancer, and possibly add additional worker endpoint(s).
>> 
>>>>>>> Ayende in previous conversations has recommended thread counts
>>> that are
>>>>>>> equal to the number of cores on the machine. I have found that
>>> isn't
>>>>>> always
>>>>>>> a perfect recipe. So in our case we have run load tests and
>>> changing the
>>>>>>> configuration of threads for each machine.
>> 
>>>>>>> When changing the thread counts on each test run, try to observe
>>> which
>>>>>>> specific process is utilizing the most CPU.
>> 
>>>>>>> There may be places to optimize for sure, but it sounds to me like
>>>>>> threads
>>>>>>> are competing for priority.
>> 
>>>>>>> On Tue, Nov 15, 2011 at 9:24 PM, Michael Lyons <
>>> [email protected]>
>>>>>> wrote:
>>>>>>>> Yes you're correct, it's a staging environment where we do our
>>> testing
>>>>>>>> before releasing into production.
>> 
>>>>>>>> That's pretty much the situation.
>> 
>>>>>>>> Here are the xml configurations for the 2 load balancers:
>> 
>>>>>>>>    <loadBalancer threadCount="5"
>>>>>>>>              endpoint="msmq://localhost/notifier.loadbalancer"
>>>>>>>>              readyForWorkEndpoint="msmq://localhost/
>>>>>>>> notifier.loadbalancer.acceptingwork"
>>>>>>>>             />
>> 
>>>>>>>>    <loadBalancer threadCount="5"
>> 
>>>  endpoint="msmq://localhost/processor.loadbalancer"
>>>>>>>>                  readyForWorkEndpoint="msmq://localhost/
>>>>>>>> processor.loadbalancer.acceptingwork"
>>>>>>>>             />
>> 
>>>>>>>> Consumers xml configuration is:
>> 
>>>>>>>>    <bus threadCount="20"
>>>>>>>>         loadBalancerEndpoint="msmq://localhost/
>>>>>>>> processor.loadbalancer.acceptingwork"
>>>>>>>>         numberOfRetries="5"
>>>>>>>>         endpoint="msmq://localhost/processor"
>>>>>>>>             />
>> 
>>>>>>>>    <bus threadCount="20"
>>>>>>>>         loadBalancerEndpoint="msmq://localhost/
>>>>>>>> notifier.loadbalancer.acceptingwork"
>>>>>>>>         numberOfRetries="5"
>>>>>>>>         endpoint="msmq://localhost/notifier"
>>>>>>>>             />
>> 
>>>>>>>> On Nov 16, 3:13 pm, Corey Kaylor <[email protected]> wrote:
>>>>>>>>> To summarize your setup.
>> 
>>>>>>>>> Load Balancer 1, configured for messages belonging to
>>> NamespaceA,
>>>>>> with 5
>>>>>>>>> threads, deployed to MachineA\queue1
>>>>>>>>>    1 worker endpoint sending sending ready for work to
>>>>>>>>> MachineA\queue1.readyforwork, configured with 20 threads,
>>> deployed to
>>>>>>>>> MachineA
>> 
>>>>>>>>> Load Balancer 2, configured for messages belonging to
>>> NamespaceB,
>>>>>> with 5
>>>>>>>>> threads, deployed to MachineA\queue2
>>>>>>>>>    1 worker endpoint sending ready for work to
>>>>>>>>> MachineA\queue2.readyforwork, configured with 20 threads,
>>> deployed to
>>>>>>>>> MachineA
>> 
>>>>>>>>> I assumed by staging server that you mean staging environment
>>> that is
>>>>>>>>> configured similarly above but with different machine specs as
>>> you've
>>>>>>>>> stated.
>> 
>>>>>>>>> Is this correct?
>> 
>>>>>>>>> On Tue, Nov 15, 2011 at 8:12 PM, Michael Lyons <
>>> [email protected]
>> 
>>>>>>>> wrote:
>>>>>>>>>> The load balancers are configured with the
>>> readyForWorkEndpoint
>>>>>>>>>> attribute on the loadBalancer xml element.
>> 
>>>>>>>>>> System is a quad core 2.83Ghz core 2 duo, on the staging
>>> server
>>>>>> which
>>>>>>>>>> is running an older single core 2.8Ghz xeon (Dell 2650) with
>>> hyper
>>>>>>>>>> threading it sits at about 80% and in production it sits
>>> between
>>>>>> 40 to
>>>>>>>>>> 80% on a quad core 2.8Ghz xeon (Dell R210) where it is
>>> allocated 2
>>>>>>>>>> cores
>> 
>>>>>>>>>> Forgot to mention that RSB is version 2.2
>> 
>>>>>>>>>> On Nov 16, 1:17 pm, Corey Kaylor <[email protected]> wrote:
>>>>>>>>>>> Also, how many cores are on the load balancer machine?
>>> There
>>>>>>>> shouldn't be
>>>>>>>>>>> that much demand on the cpu, but having said that it really
>>>>>> depends
>>>>>>>> on
>>>>>>>>>> the
>>>>>>>>>>> circumstances and environment.
>> 
>>>>>>>>>>> On Tue, Nov 15, 2011 at 7:15 PM, Corey Kaylor <
>>> [email protected]
>> 
>>>>>>>> wrote:
>>>>>>>>>>>> Is each load balancer configured with a ready for work
>>> uri?
>> 
>>>>>>>>>>>> On Mon, Nov 14, 2011 at 5:06 PM, Michael Lyons <
>>>>>>>> [email protected]
>>>>>>>>>>> wrote:
>> 
>>>>>>>>>>>>> When using the load balancer with RSB I'm seeing the
>>> CPU runs
>>>>>> at
>>>>>>>> near
>>>>>>>>>>>>> 100% when the consumers are all busy which causes the
>>>>>> consumers
>>>>>>>> to run
>>>>>>>>>>>>> slower and be free less often.
>>>>>>>>>>>>> It can be simulated easily by setting up a load
>>> balancer with
>>>>>> no
>>>>>>>>>>>>> consumers listening to it and trying to send out some
>>>>>> messages to
>>>>>>>> the
>>>>>>>>>>>>> consumer.
>> 
>>>>>>>>>>>>> In my specific situation I have 2 load balancers with 5
>>>>>> threads
>>>>>>>> each
>>>>>>>>>>>>> (each load balancer runs a separate queue
>> 
>> ...
>> 
>> read more »
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Rhino Tools Dev" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/rhino-tools-dev?hl=en.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Rhino Tools Dev" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/rhino-tools-dev?hl=en.

Re: [rhino-tools-dev] Re: Rhino Service Bus with Load Balancer pegs CPU around 100%

Reply via email to