Tim --
Yes, it is at least one problem. I've applied and tested a very small
targeted fix that seems to work. The problem is that this code is
effectively a state machine, which I haven't had the time to fully
understand. Making changes to that type of code always makes me nervous
because there's always more states than you realize at first.
I will include my patch with the bug but I would probably recommend not
taking it as a long-term solution.
For the next few week, I'm swamped with trying to get a release out the door
at work and finals at school. After that, I may have a chance to look into
the failover code in more detail.
Thanks,
Ted C.
Timothy Bish wrote:
>
> On Thu, 2010-03-04 at 17:03 -0800, Ted C. wrote:
>> I'm happy to do so, will probably be tomorrow. Just as a note, I think
>> that
>> I'm getting this because in FailoverTransport there's the following if:
>>
>> if(ConnectedTransport != null || disposed ||
>> connectionFailure != null)
>> {
>> return false;
>> }
>> else
>>
>>
>>
>> it appears (as in I've seen this in a couple of iterations and haven't
>> gotten back to it, yet) that connectionFailure is not null, so there's an
>> immediate return false and the the loop spins.
>> I
>> Speaking of which, I'm not sure I see a way for connectionFailure to ever
>> become null again. It appears that it's only assigned in the else part
>> of
>> the if above. Am I missing something?
>>
>> Thanks,
>
> Its quite possible that this is the problem. I've not had time yet to
> test this. The FailoverTransport code is in need of a code review, so
> its not surprising there's some issues in there.
>
> Regards
> Tim.
>
>
>>
>> Ted C.
>>
>>
>>
>> Timothy Bish wrote:
>> >
>> > On Tue, 2010-03-02 at 17:27 -0800, Ted C. wrote:
>> >> It appears that NMS is pegging the CPU. In my scenario, there's one
>> >> broker
>> >> running and the broker goes down. When that hapens, my CPU
>> utilization
>> >> goes
>> >> to 100% and never recovers.
>> >>
>> >> When I break into the program, I see that FailoverTask.Iterate is
>> getting
>> >> called frequently. I ran it under dotTrace and got the following:
>> >>
>> >> 32.70 % Thread #105762776 - 14308 ms - 0 calls
>> >> 32.70 %
>> System.Threading._ThreadPoolWaitCallback.PerformWaitCallback...
>> >> -
>> >> 14308* ms - 0 calls
>> >> 32.70 % Run - 14308* ms - 0 calls -
>> >> Apache.NMS.ActiveMQ.Threads.PooledTaskRunner.Run(Object)
>> >> 32.70 % RunTask - 14308* ms - 0 calls -
>> >> Apache.NMS.ActiveMQ.Threads.PooledTaskRunner.RunTask()
>> >> 32.70 % Iterate - 14308* ms - 0 calls -
>> >>
>> Apache.NMS.ActiveMQ.Transport.Failover.FailoverTransport.FailoverTask.Iterate()
>> >> 23.59 % WaitOne - 10323* ms - 0 calls -
>> >> System.Threading.WaitHandle.WaitOne()
>> >> 8.22 % ReleaseMutex - 3597 ms - 0 calls -
>> >> System.Threading.Mutex.ReleaseMutex()
>> >> 0.67 % get_ConnectedTransport - 291 ms - 0 calls -
>> >>
>> Apache.NMS.ActiveMQ.Transport.Failover.FailoverTransport.get_ConnectedTransport()
>> >> 0.22 % DoConnect - 97 ms - 0 calls -
>> >> Apache.NMS.ActiveMQ.Transport.Failover.FailoverTransport.DoConnect()
>> >>
>> >> Anybody seen similar issues? This is ActiveMQ 5.3 and NMS 1.2.0.
>> >>
>> >> Thanks,
>> >>
>> >> Ted C.
>> >>
>> >
>> > This isn't an issue that's been reported yet. Could you raise a new
>> > Jira issue regarding this? I'd expect that the initial failure would
>> > cause a spike in CPU but would expect that the reconnect delay would
>> > cause that to settle down as it increases.
>> >
>> > Regards
>> >
>> >
>> > --
>> > Tim Bish
>> >
>> > Open Source Integration: http://fusesource.com
>> > ActiveMQ in Action: http://www.manning.com/snyder/
>> >
>> > Follow me on Twitter: http://twitter.com/tabish121
>> > My Blog: http://timbish.blogspot.com/
>> >
>> >
>> >
>>
>
> --
> Tim Bish
>
> Open Source Integration: http://fusesource.com
> ActiveMQ in Action: http://www.manning.com/snyder/
>
> Follow me on Twitter: http://twitter.com/tabish121
> My Blog: http://timbish.blogspot.com/
>
>
>
--
View this message in context:
http://old.nabble.com/NMS-Failover-transport-pegging-CPU-tp27763465p27824106.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.