Can you send a pull request? On Nov 16, 2011, at 8:56 PM, Michael Lyons <[email protected]> wrote:
> Great news Corey after a little bit of playing around I found what > seems to be a possible solution. > > I reworked the code in MsmqLoadBalancer so that after a number of > failures to contact a worker it would then pause the thread for a > second and reset the failure count back to zero. By doing so the load > balancer dropped CPU usage to around 7%. > > It worked perfectly in the situation when a worker was busy and > another worker process was started alleviating the queue backlog > without the load balancer trying to hog the system. > > My code for the change to MsmqLoadBalancer.HandleStandardMessage can > be found here: http://pastebin.com/0PbC6ecB > > > On Nov 17, 12:56 am, Corey Kaylor <[email protected]> wrote: >> Ok, I'll take a look when I get into the office. I may suggest changes to >> make and have you try them out. I have run into similar issues with rhino >> queues being too eager in peeking messages in the past. >> >> >> >> >> >> >> >> On Wed, Nov 16, 2011 at 1:00 AM, Michael Lyons <[email protected]> wrote: >>> Sorry about that last message, for some reason it lost it's formatting >> >>> On Nov 16, 6:44 pm, Michael Lyons <[email protected]> wrote: >>>> Strangely enough I'm going to be testing load balancing next week >>>> across physical servers as I have provisioned another server last week >>>> for the staging environment to test this out. >>>> In our case the workers get tied up as they are contacting website >>>> services which sometimes can be really slow (up to 120 seconds) >>>> causing the load balancers queue to grow. My idea with the load >>>> balancer was so I can spin up a new worker process when the queue >>>> becomes too large, which is what I can do currently and it works >>>> perfectly, it's just that the load balancer is consuming more >>>> resources than it needs to while the machine is really not under any >>>> other stress. >>>> I've just done some quick profile and all the action seems to be >>>> called from AbstractMsmqListener.PeekMessageOnBackgroundThread. It >>>> spends 53% of its time in calls to >>>> MsmqLoadBalancer.HandlePeekedMessage and it's children with the >>>> remaining 47% in AbstractMsmqListener.TryPeek and it's children. >>>> So over a total period of 4 minutes RSB consumed 183 seconds out of >>>> 240 seconds of CPU time excluding my app's time. Which I think is a >>>> bit excessive particularly since it peeked at 226130 messages. >>>> Shouldn't the load balancer pause for a second if it failed to get in >>>> contact with any of the workers, instead of just blindly retrying? >>>> Here are the top offenders in csv format - if you want I can email you >>>> a full csv (it's actually tab delimited) or a pdf. >>>> Total Time with children (ms), Average Time with children (ms), Total >>>> for self (ms), Average for self (ms), Calls, Method name >> >>> +183366,0.8,11384,0.1,226122,Rhino.ServiceBus.LoadBalancer.MsmqLoadBalancer >>> .HandlePeekedMessage(Rhino.ServiceBus.Msmq.OpenedQueue,System.Messaging.Mes >>> sage) >> >>> +160651,0.7,4743,0,226130,Rhino.ServiceBus.Msmq.AbstractMsmqListener.TryPee >>> k(Rhino.ServiceBus.Msmq.OpenedQueue,System.Messaging.Message&) >> >>> +155724,0.7,155724,0.7,226130,Rhino.ServiceBus.Msmq.OpenedQueue.Peek(System >>> .TimeSpan) >> >>> +134270,0.6,134138,0.6,226125,Rhino.ServiceBus.Msmq.OpenedQueue.TryGetMessa >>> geFromQueue(System.String)29787,0.2,1159,0,180430,Rhino.ServiceBus.LoadBala >>> ncer.MsmqLoadBalancer.HandleStandardMessage(Rhino.ServiceBus.Msmq.OpenedQue >>> ue,System.Messaging.Message)28569,0.2,28546,0.2,180430,Rhino.ServiceBus.Msm >>> q.OpenedQueue.Send(System.Messaging.Message)7759,0,1312,0,180432,Rhino.Serv >>> iceBus.LoadBalancer.MsmqLoadBalancer.PersistEndpoint(Rhino.ServiceBus.Msmq. >>> OpenedQueue,System.Messaging.Message)3825,0,3825,0,180432,Rhino.ServiceBus. >>> Msmq.MsmqUtil.GetQueueUri(System.Messaging.MessageQueue)2622,0,2622,0,18043 >>> 1,Rhino.ServiceBus.DataStructures.Set`1.Add(T) >>>> On Nov 16, 4:39 pm, Corey Kaylor <[email protected]> wrote: >> >>>>> I am happy to take any form of contribution you can offer. >> >>>>> By adding additional worker endpoints I mean. >> >>>>> Load Balancer 1, 5 threads, deployed to MachineA >>>>> 1 worker endpoint, configured to send to >>> Machine1\queue1.readyforwork, 5 >>>>> threads, deployed to NewMachineB >>>>> 2 worker endpoint, configured to send to >>> Machine1\queue1.readyforwork, 5 >>>>> threads, deployed to NewMachineC >> >>>>> Load balancing although completely *possible* to run on one machine, >>> was >>>>> designed to distribute load to multiple machines. You're not gaining >>> any >>>>> benefits from load balancing when there is only one worker sending >>> ready >>>>> for work messages to the load balancer. You would be better off in this >>>>> case just having two endpoints without load balancing. >> >>>>> On Tue, Nov 15, 2011 at 10:28 PM, Michael Lyons <[email protected] >>>> wrote: >> >>>>>> I've run EQATEC profiler against the code and when the load balancer >>>>>> process is under load it it records no activity between snapshots >>>>>> indicating it is sitting in RSB code. >> >>>>>> I'd be happy to spot profile RSB in my app and point out where the >>>>>> high CPU is coming from but I'm assuming you already have a fair >>> idea. >> >>>>>> What do you mean by adding additional worker endpoints? Can you point >>>>>> me to an example. >> >>>>>> On Nov 16, 3:40 pm, Corey Kaylor <[email protected]> wrote: >>>>>>> I would try changing the thread counts on the consumers and the >>> load >>>>>>> balancer, and possibly add additional worker endpoint(s). >> >>>>>>> Ayende in previous conversations has recommended thread counts >>> that are >>>>>>> equal to the number of cores on the machine. I have found that >>> isn't >>>>>> always >>>>>>> a perfect recipe. So in our case we have run load tests and >>> changing the >>>>>>> configuration of threads for each machine. >> >>>>>>> When changing the thread counts on each test run, try to observe >>> which >>>>>>> specific process is utilizing the most CPU. >> >>>>>>> There may be places to optimize for sure, but it sounds to me like >>>>>> threads >>>>>>> are competing for priority. >> >>>>>>> On Tue, Nov 15, 2011 at 9:24 PM, Michael Lyons < >>> [email protected]> >>>>>> wrote: >>>>>>>> Yes you're correct, it's a staging environment where we do our >>> testing >>>>>>>> before releasing into production. >> >>>>>>>> That's pretty much the situation. >> >>>>>>>> Here are the xml configurations for the 2 load balancers: >> >>>>>>>> <loadBalancer threadCount="5" >>>>>>>> endpoint="msmq://localhost/notifier.loadbalancer" >>>>>>>> readyForWorkEndpoint="msmq://localhost/ >>>>>>>> notifier.loadbalancer.acceptingwork" >>>>>>>> /> >> >>>>>>>> <loadBalancer threadCount="5" >> >>> endpoint="msmq://localhost/processor.loadbalancer" >>>>>>>> readyForWorkEndpoint="msmq://localhost/ >>>>>>>> processor.loadbalancer.acceptingwork" >>>>>>>> /> >> >>>>>>>> Consumers xml configuration is: >> >>>>>>>> <bus threadCount="20" >>>>>>>> loadBalancerEndpoint="msmq://localhost/ >>>>>>>> processor.loadbalancer.acceptingwork" >>>>>>>> numberOfRetries="5" >>>>>>>> endpoint="msmq://localhost/processor" >>>>>>>> /> >> >>>>>>>> <bus threadCount="20" >>>>>>>> loadBalancerEndpoint="msmq://localhost/ >>>>>>>> notifier.loadbalancer.acceptingwork" >>>>>>>> numberOfRetries="5" >>>>>>>> endpoint="msmq://localhost/notifier" >>>>>>>> /> >> >>>>>>>> On Nov 16, 3:13 pm, Corey Kaylor <[email protected]> wrote: >>>>>>>>> To summarize your setup. >> >>>>>>>>> Load Balancer 1, configured for messages belonging to >>> NamespaceA, >>>>>> with 5 >>>>>>>>> threads, deployed to MachineA\queue1 >>>>>>>>> 1 worker endpoint sending sending ready for work to >>>>>>>>> MachineA\queue1.readyforwork, configured with 20 threads, >>> deployed to >>>>>>>>> MachineA >> >>>>>>>>> Load Balancer 2, configured for messages belonging to >>> NamespaceB, >>>>>> with 5 >>>>>>>>> threads, deployed to MachineA\queue2 >>>>>>>>> 1 worker endpoint sending ready for work to >>>>>>>>> MachineA\queue2.readyforwork, configured with 20 threads, >>> deployed to >>>>>>>>> MachineA >> >>>>>>>>> I assumed by staging server that you mean staging environment >>> that is >>>>>>>>> configured similarly above but with different machine specs as >>> you've >>>>>>>>> stated. >> >>>>>>>>> Is this correct? >> >>>>>>>>> On Tue, Nov 15, 2011 at 8:12 PM, Michael Lyons < >>> [email protected] >> >>>>>>>> wrote: >>>>>>>>>> The load balancers are configured with the >>> readyForWorkEndpoint >>>>>>>>>> attribute on the loadBalancer xml element. >> >>>>>>>>>> System is a quad core 2.83Ghz core 2 duo, on the staging >>> server >>>>>> which >>>>>>>>>> is running an older single core 2.8Ghz xeon (Dell 2650) with >>> hyper >>>>>>>>>> threading it sits at about 80% and in production it sits >>> between >>>>>> 40 to >>>>>>>>>> 80% on a quad core 2.8Ghz xeon (Dell R210) where it is >>> allocated 2 >>>>>>>>>> cores >> >>>>>>>>>> Forgot to mention that RSB is version 2.2 >> >>>>>>>>>> On Nov 16, 1:17 pm, Corey Kaylor <[email protected]> wrote: >>>>>>>>>>> Also, how many cores are on the load balancer machine? >>> There >>>>>>>> shouldn't be >>>>>>>>>>> that much demand on the cpu, but having said that it really >>>>>> depends >>>>>>>> on >>>>>>>>>> the >>>>>>>>>>> circumstances and environment. >> >>>>>>>>>>> On Tue, Nov 15, 2011 at 7:15 PM, Corey Kaylor < >>> [email protected] >> >>>>>>>> wrote: >>>>>>>>>>>> Is each load balancer configured with a ready for work >>> uri? >> >>>>>>>>>>>> On Mon, Nov 14, 2011 at 5:06 PM, Michael Lyons < >>>>>>>> [email protected] >>>>>>>>>>> wrote: >> >>>>>>>>>>>>> When using the load balancer with RSB I'm seeing the >>> CPU runs >>>>>> at >>>>>>>> near >>>>>>>>>>>>> 100% when the consumers are all busy which causes the >>>>>> consumers >>>>>>>> to run >>>>>>>>>>>>> slower and be free less often. >>>>>>>>>>>>> It can be simulated easily by setting up a load >>> balancer with >>>>>> no >>>>>>>>>>>>> consumers listening to it and trying to send out some >>>>>> messages to >>>>>>>> the >>>>>>>>>>>>> consumer. >> >>>>>>>>>>>>> In my specific situation I have 2 load balancers with 5 >>>>>> threads >>>>>>>> each >>>>>>>>>>>>> (each load balancer runs a separate queue >> >> ... >> >> read more ยป > > -- > You received this message because you are subscribed to the Google Groups > "Rhino Tools Dev" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/rhino-tools-dev?hl=en. > -- You received this message because you are subscribed to the Google Groups "Rhino Tools Dev" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/rhino-tools-dev?hl=en.
