On 04/19/2013 10:07 AM, NeilOwens wrote:
I started seeing this mystery disconnection when I enabled the broadcast of
CAN bus messages received to all the nodes in the system.  This is the
source of the 4-5 256 byte messages a second data feed I talked about in my
original post (but it can be higher).  This isn't actually required by
anything in the system at the moment, but will be needed later, and I want
to replace our current sockets-based messaging system with qpid going
forward, so it's worth my time to fix the problem now.

I wonder if turning up the logging might help shed some light... e.g. log-enable=trace+ - that will generate a lot of logs (which may be a problem?) but it would at least give us lots of data to examine...

Since the difference was a data rate thing I theorised that there could
possibly be a backlog in the system.  Anyway, I added the 10 second queue
purge to prevent the queues from getting too big in case that was the issue.
If one of the receiving processes crashed for any reason I saw a fast
buildup of messages.  We only have 256Mb on our smallest units, so we need
to keep a tight lid on memory usage.  Anyway, that was when I looked into
the defaults of TTL of messages, purge interval and queue size, and applied
those settings.  I'm setting a 5s TTL on the sent messages.

The ring queue is another good way to bound the memory consumption. You can set pre-queue limits as well if you like to further restrict the backlog than can build up.

Handling expired messages isn't as efficient in some cases. If the queue has a subscriber, the external reaper doesn't need to do anything, expired messages are dropped using the IO thread of the subscriber. However when that is not the case the timer thread sweeps through the queue looking for messages to expire. For a large backlog that can take some time. It doesn't sound like that is the obvious culprit here though...

[...]
Thanks for the tips on qpid.policy.  I had a lot of trouble finding specs
for the format of that string.

There is a wiki page that may be helpful: https://cwiki.apache.org/confluence/display/qpid/Addressing+Examples and the programming guide has some description of the addressing options: http://qpid.apache.org/books/0.20/Programming-In-Apache-Qpid/html/.

[...]
The really odd thing is that when this fault happens it appears as if the
network interfaces shut down on the target.  I can't telnet into them, the
applications slow down.

If the network interfaces did shut down that would certainly cause a timeout. I can't think how qpidd could cause that though. The only thing I could think of would be memory exhaustion. Do you know what the memory usage was at that time (or is that something you could record)?


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to