Hi Ted, I don't have any flow control that I'm aware of. Will send the logs separately.
Cheers, Jimmy ----- Original Message ----- From: Ted Ross Sent: 09/06/13 02:02 PM To: [email protected] Subject: Re: System stalling Jimmy, Do your ring queues have any flow-control configuration set up? This would be --flow-* thresholds in qpid-config. Also, it would be helpful to see the output of a pstack on the qpidd process when the condition occurs. I think almost everything happens under DispatchHandle::processEvent :) -Ted On 09/06/2013 09:50 AM, Jimmy Jones wrote: > I've done some further digging, and managed to simplify the system a little > to reproduce the problem. The system is now an external process that posts > messages to the default headers exchange on my machine, which has a ring > queue to receive effectively all messages from the default headers exchange, > process them, and post to another headers exchange. There is now nothing > listening on the subsequent headers exchange, and all exchanges are > non-durable. I've also tried Fraser's suggestion of marking the link as > unreliable on the queue which seems to have no effect (is there any way in > the qpid utilities to confirm the link has been set to unreliable?) > > So essentially what happens is the system happily processes away, normally > with an empty ring queue, sometimes it spikes up a bit and goes back down > again, with my ingest process using ~70% CPU and qpidd ~50% CPU, on a machine > with 8 CPU cores. However sometimes the queue spikes up to 2GB (the max), > starts throwing messages away, and qpid hits 100%+ CPU and the ingest process > goes to about 3% CPU. I can see messages are being very slowly processed. > > I've tried attaching to qpidd with gdb a few times, and all threads apart > from one seem to be idle in epoll_wait or pthread_cond_wait. The running > thread always seems to be somewhere under DispatchHandle::processEvent. > > I'm at a bit of a loss for what I can do to fix this! > > Jimmy > > ----- Original Message ----- > From: Fraser Adams > Sent: 08/23/13 09:09 AM > To: [email protected] > Subject: Re: System stalling > Hi Jimmy, hope you are well! > As an experiment one thing that you could try is messing with the link > "reliability". As you know in the normal mode of operation it's > necessary to periodically send acknowledgements from the consumer client > application which then get passed back ultimately to the broker. > > I'm no expert on this but from my recollection if you are in a position > particularly where circular queues are overflowing and you are > continually trying to produce and consume and you have some fair level > of prefetch/capacity on the consumer the mechanism for handling the > acknowledgements on the broker is "sub-optimal" - I think it's a linear > search or some such and there are conditions where catching up with > acknowledgements becomes a bit "N squared". > > Gordon would be able to explain this way better than me - that's > assuming this hypothesis is even relevant :-) > > Anyway if you try having a link: {reliability: unreliable} stanza in > your consumer address string (as an example one of mine looks like the > following - the address sting syntax isn't exactly trivial :-)). > > string address = "test_consumer; {create: receiver, node: {x-declare: > {auto-delete: True, exclusive: True, arguments: {'qpid.policy_type': > ring, 'qpid.max_size': 100000000}}, x-bindings: [{exchange: 'amq.match', > queue: 'test_consumer', key: 'test1', arguments: {x-match: all, > data-format: test}}]}, link: {reliability: unreliable}}"; > > Clearly your arguments would be different but hopefully it'll give you a > kick start. > > > The main down side of disabling link reliability is that if you have > enabled prefetch and the consumer unexpectedly dies then all of the > messages on the prefetch queue will be lost, whereas with reliable > messaging the broker maintains references to all unacknowledged messages > so would resent them (I *think* that's how it works.....) > > > At the very least it's a fairly simple tweak to your consumer addresses > that might rule out (or point to) acknowledgement shenanigans as being > the root of your problem. From my own experience I always end up blaming > this first if I hit performance weirdness with ring queues :-) > > HTH, > Frase > > > > On 21/08/13 17:08, Jimmy Jones wrote: >>>>>> I've got an simple processing system using the 0.22 C++ broker, all >>>>>> on one box, where an external system posts messages to the default >>>>>> headers exchange, and an ingest process receives them using a ring >>>>>> queue, transforms them and outputs to a different headers exchange. >>>>>> Various other processes pick messages of interest off that exchange >>>>>> using ring queues. Recently however the system has been stalling - >>>>>> I'm still receiving lots of data from the other system, but the >>>>>> ingest process suddenly goes to <5% CPU usage and its queue fills up >>>>>> and messages start getting discarded from the ring, the follow on >>>>>> processes go to practically 0% CPU and qpidd hovers around 95-120% >>>>>> CPU (normally its ~75%) and the rest of the system pretty much goes >>>>>> idle (no swapping, there is free memory) >>>>>> >>>>>> I attached to the ingest process with gdb and it was stuck in send >>>>>> (waitForCapacity/waitForCompletionImpl) - I notice this can block. >>>>> Is there any queue bound to the second headers exchange, i.e. to the one >>>>> this ingest process is sending to, that is not a ring queue? (If you run >>>>> qpid-config queue -r, you get a quick listing of the queues and their >>>>> bindings). >>>> I've run qpid-config queue, and all my queues have --limit-policy=ring, >>>> apart >>>> from a UUID one which I presume is qpid-config itself. Are there any other >>>> useful >>>> debugging things I can do? >>> What does qpid-stat -q show? Is it possible to test whether the broker >>> is still responsive e,g, by sending and receiving messages through a >>> test queue/exchange? Are there any errors in the logs? Are any of the >>> queues durable (and messages persistent)? >> qpid-stat -q is all zero's in the msg & bytes column, apart from the ingest >> queue, >> and another overflowing ring queue I have. >> >> I did run qpid-tool when the system was broken to dump some stats. >> msgTotalDequeues >> was slowly incremeneting on the ingest queue, so I presume messages were >> still being >> delivered and the broker was responsive? >> >> The only logging I've got is syslog, and I just see a warning about unsent >> data, >> presumably when the ingest process receives a SIGALARM. I'm happy to swich >> on more >> logging, what would you recommend? >> >> None of my queues are durable, but I think incoming messages from the other >> system >> are marked as durable. The exchange that the ingest process sends to is >> durable, >> but I'm not setting any durable flags on outgoing messages (I presume the >> default >> is off). >> >>> Another thing might be a ptrace of the broker process. Maybe two or >>> three with a short delay between them. >> I'll try this next time it goes haywire. >> >>> For some reason it seems like the broker is not sending back >>> confirmation to the sender in the ingest process, causing that to block. >>> Ring queues shouldn't be subject to producer flow control so we need to >>> figure out what other reason there could be for that. >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
