Re: System stalling

Jimmy Jones Mon, 09 Sep 2013 09:15:32 -0700

Hi Ted,

I don't have any flow control that I'm aware of. Will send the logs separately.


Cheers,

Jimmy
  
----- Original Message -----
From: Ted Ross
Sent: 09/06/13 02:02 PM
To: [email protected]
Subject: Re: System stalling
 Jimmy,

Do your ring queues have any flow-control configuration set up? This 
would be --flow-* thresholds in qpid-config.

Also, it would be helpful to see the output of a pstack on the qpidd 
process when the condition occurs. I think almost everything happens 
under DispatchHandle::processEvent :)

-Ted

On 09/06/2013 09:50 AM, Jimmy Jones wrote:
> I've done some further digging, and managed to simplify the system a little 
> to reproduce the problem. The system is now an external process that posts 
> messages to the default headers exchange on my machine, which has a ring 
> queue to receive effectively all messages from the default headers exchange, 
> process them, and post to another headers exchange. There is now nothing 
> listening on the subsequent headers exchange, and all exchanges are 
> non-durable. I've also tried Fraser's suggestion of marking the link as 
> unreliable on the queue which seems to have no effect (is there any way in 
> the qpid utilities to confirm the link has been set to unreliable?)
>
> So essentially what happens is the system happily processes away, normally 
> with an empty ring queue, sometimes it spikes up a bit and goes back down 
> again, with my ingest process using ~70% CPU and qpidd ~50% CPU, on a machine 
> with 8 CPU cores. However sometimes the queue spikes up to 2GB (the max), 
> starts throwing messages away, and qpid hits 100%+ CPU and the ingest process 
> goes to about 3% CPU. I can see messages are being very slowly processed.
>
> I've tried attaching to qpidd with gdb a few times, and all threads apart 
> from one seem to be idle in epoll_wait or pthread_cond_wait. The running 
> thread always seems to be somewhere under DispatchHandle::processEvent.
>
> I'm at a bit of a loss for what I can do to fix this!
>
> Jimmy
> 
> ----- Original Message -----
> From: Fraser Adams
> Sent: 08/23/13 09:09 AM
> To: [email protected]
> Subject: Re: System stalling
> Hi Jimmy, hope you are well!
> As an experiment one thing that you could try is messing with the link
> "reliability". As you know in the normal mode of operation it's
> necessary to periodically send acknowledgements from the consumer client
> application which then get passed back ultimately to the broker.
>
> I'm no expert on this but from my recollection if you are in a position
> particularly where circular queues are overflowing and you are
> continually trying to produce and consume and you have some fair level
> of prefetch/capacity on the consumer the mechanism for handling the
> acknowledgements on the broker is "sub-optimal" - I think it's a linear
> search or some such and there are conditions where catching up with
> acknowledgements becomes a bit "N squared".
>
> Gordon would be able to explain this way better than me - that's
> assuming this hypothesis is even relevant :-)
>
> Anyway if you try having a link: {reliability: unreliable} stanza in
> your consumer address string (as an example one of mine looks like the
> following - the address sting syntax isn't exactly trivial :-)).
>
> string address = "test_consumer; {create: receiver, node: {x-declare:
> {auto-delete: True, exclusive: True, arguments: {'qpid.policy_type':
> ring, 'qpid.max_size': 100000000}}, x-bindings: [{exchange: 'amq.match',
> queue: 'test_consumer', key: 'test1', arguments: {x-match: all,
> data-format: test}}]}, link: {reliability: unreliable}}";
>
> Clearly your arguments would be different but hopefully it'll give you a
> kick start.
>
>
> The main down side of disabling link reliability is that if you have
> enabled prefetch and the consumer unexpectedly dies then all of the
> messages on the prefetch queue will be lost, whereas with reliable
> messaging the broker maintains references to all unacknowledged messages
> so would resent them (I *think* that's how it works.....)
>
>
> At the very least it's a fairly simple tweak to your consumer addresses
> that might rule out (or point to) acknowledgement shenanigans as being
> the root of your problem. From my own experience I always end up blaming
> this first if I hit performance weirdness with ring queues :-)
>
> HTH,
> Frase
>
>
>
> On 21/08/13 17:08, Jimmy Jones wrote:
>>>>>> I've got an simple processing system using the 0.22 C++ broker, all
>>>>>> on one box, where an external system posts messages to the default
>>>>>> headers exchange, and an ingest process receives them using a ring
>>>>>> queue, transforms them and outputs to a different headers exchange.
>>>>>> Various other processes pick messages of interest off that exchange
>>>>>> using ring queues. Recently however the system has been stalling -
>>>>>> I'm still receiving lots of data from the other system, but the
>>>>>> ingest process suddenly goes to <5% CPU usage and its queue fills up
>>>>>> and messages start getting discarded from the ring, the follow on
>>>>>> processes go to practically 0% CPU and qpidd hovers around 95-120%
>>>>>> CPU (normally its ~75%) and the rest of the system pretty much goes
>>>>>> idle (no swapping, there is free memory)
>>>>>>
>>>>>> I attached to the ingest process with gdb and it was stuck in send
>>>>>> (waitForCapacity/waitForCompletionImpl) - I notice this can block.
>>>>> Is there any queue bound to the second headers exchange, i.e. to the one
>>>>> this ingest process is sending to, that is not a ring queue? (If you run
>>>>> qpid-config queue -r, you get a quick listing of the queues and their
>>>>> bindings).
>>>> I've run qpid-config queue, and all my queues have --limit-policy=ring, 
>>>> apart
>>>> from a UUID one which I presume is qpid-config itself. Are there any other 
>>>> useful
>>>> debugging things I can do?
>>> What does qpid-stat -q show? Is it possible to test whether the broker
>>> is still responsive e,g, by sending and receiving messages through a
>>> test queue/exchange? Are there any errors in the logs? Are any of the
>>> queues durable (and messages persistent)?
>> qpid-stat -q is all zero's in the msg & bytes column, apart from the ingest 
>> queue,
>> and another overflowing ring queue I have.
>>
>> I did run qpid-tool when the system was broken to dump some stats. 
>> msgTotalDequeues
>> was slowly incremeneting on the ingest queue, so I presume messages were 
>> still being
>> delivered and the broker was responsive?
>>
>> The only logging I've got is syslog, and I just see a warning about unsent 
>> data,
>> presumably when the ingest process receives a SIGALARM. I'm happy to swich 
>> on more
>> logging, what would you recommend?
>>
>> None of my queues are durable, but I think incoming messages from the other 
>> system
>> are marked as durable. The exchange that the ingest process sends to is 
>> durable,
>> but I'm not setting any durable flags on outgoing messages (I presume the 
>> default
>> is off).
>>
>>> Another thing might be a ptrace of the broker process. Maybe two or
>>> three with a short delay between them.
>> I'll try this next time it goes haywire.
>>
>>> For some reason it seems like the broker is not sending back
>>> confirmation to the sender in the ingest process, causing that to block.
>>> Ring queues shouldn't be subject to producer flow control so we need to
>>> figure out what other reason there could be for that.
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected] 

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: System stalling

Reply via email to