Tim --
The issue number is AMQNET-241.
Thanks,
Ted C.
Ted C. wrote:
>
> Tim --
>
> Yes, it is at least one problem. I've applied and tested a very small
> targeted fix that seems to work. The problem is that this code is
> effectively a state machine, which I haven't had the time to fully
> understand. Making changes to that type of code always makes me nervous
> because there's always more states than you realize at first.
>
> I will include my patch with the bug but I would probably recommend not
> taking it as a long-term solution.
>
> For the next few week, I'm swamped with trying to get a release out the
> door at work and finals at school. After that, I may have a chance to
> look into the failover code in more detail.
>
> Thanks,
>
> Ted C.
>
>
> Timothy Bish wrote:
>>
>> On Thu, 2010-03-04 at 17:03 -0800, Ted C. wrote:
>>> I'm happy to do so, will probably be tomorrow. Just as a note, I think
>>> that
>>> I'm getting this because in FailoverTransport there's the following if:
>>>
>>> if(ConnectedTransport != null || disposed ||
>>> connectionFailure != null)
>>> {
>>> return false;
>>> }
>>> else
>>>
>>>
>>>
>>> it appears (as in I've seen this in a couple of iterations and haven't
>>> gotten back to it, yet) that connectionFailure is not null, so there's
>>> an
>>> immediate return false and the the loop spins.
>>> I
>>> Speaking of which, I'm not sure I see a way for connectionFailure to
>>> ever
>>> become null again. It appears that it's only assigned in the else part
>>> of
>>> the if above. Am I missing something?
>>>
>>> Thanks,
>>
>> Its quite possible that this is the problem. I've not had time yet to
>> test this. The FailoverTransport code is in need of a code review, so
>> its not surprising there's some issues in there.
>>
>> Regards
>> Tim.
>>
>>
>>>
>>> Ted C.
>>>
>>>
>>>
>>> Timothy Bish wrote:
>>> >
>>> > On Tue, 2010-03-02 at 17:27 -0800, Ted C. wrote:
>>> >> It appears that NMS is pegging the CPU. In my scenario, there's one
>>> >> broker
>>> >> running and the broker goes down. When that hapens, my CPU
>>> utilization
>>> >> goes
>>> >> to 100% and never recovers.
>>> >>
>>> >> When I break into the program, I see that FailoverTask.Iterate is
>>> getting
>>> >> called frequently. I ran it under dotTrace and got the following:
>>> >>
>>> >> 32.70 % Thread #105762776 - 14308 ms - 0 calls
>>> >> 32.70 %
>>> System.Threading._ThreadPoolWaitCallback.PerformWaitCallback...
>>> >> -
>>> >> 14308* ms - 0 calls
>>> >> 32.70 % Run - 14308* ms - 0 calls -
>>> >> Apache.NMS.ActiveMQ.Threads.PooledTaskRunner.Run(Object)
>>> >> 32.70 % RunTask - 14308* ms - 0 calls -
>>> >> Apache.NMS.ActiveMQ.Threads.PooledTaskRunner.RunTask()
>>> >> 32.70 % Iterate - 14308* ms - 0 calls -
>>> >>
>>> Apache.NMS.ActiveMQ.Transport.Failover.FailoverTransport.FailoverTask.Iterate()
>>> >> 23.59 % WaitOne - 10323* ms - 0 calls -
>>> >> System.Threading.WaitHandle.WaitOne()
>>> >> 8.22 % ReleaseMutex - 3597 ms - 0 calls -
>>> >> System.Threading.Mutex.ReleaseMutex()
>>> >> 0.67 % get_ConnectedTransport - 291 ms - 0 calls -
>>> >>
>>> Apache.NMS.ActiveMQ.Transport.Failover.FailoverTransport.get_ConnectedTransport()
>>> >> 0.22 % DoConnect - 97 ms - 0 calls -
>>> >> Apache.NMS.ActiveMQ.Transport.Failover.FailoverTransport.DoConnect()
>>> >>
>>> >> Anybody seen similar issues? This is ActiveMQ 5.3 and NMS 1.2.0.
>>> >>
>>> >> Thanks,
>>> >>
>>> >> Ted C.
>>> >>
>>> >
>>> > This isn't an issue that's been reported yet. Could you raise a new
>>> > Jira issue regarding this? I'd expect that the initial failure would
>>> > cause a spike in CPU but would expect that the reconnect delay would
>>> > cause that to settle down as it increases.
>>> >
>>> > Regards
>>> >
>>> >
>>> > --
>>> > Tim Bish
>>> >
>>> > Open Source Integration: http://fusesource.com
>>> > ActiveMQ in Action: http://www.manning.com/snyder/
>>> >
>>> > Follow me on Twitter: http://twitter.com/tabish121
>>> > My Blog: http://timbish.blogspot.com/
>>> >
>>> >
>>> >
>>>
>>
>> --
>> Tim Bish
>>
>> Open Source Integration: http://fusesource.com
>> ActiveMQ in Action: http://www.manning.com/snyder/
>>
>> Follow me on Twitter: http://twitter.com/tabish121
>> My Blog: http://timbish.blogspot.com/
>>
>>
>>
>
>
--
View this message in context:
http://old.nabble.com/NMS-Failover-transport-pegging-CPU-tp27763465p27825429.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.