Gordon, thank you for your response. Yes, the test with larger message has failed every time. I will look at sending the trace logs and a thread dump from the client from our system.
thanks, Tom Maggio Raytheon Co. Dallas, TX 972/205-4377 On Wed, Dec 7, 2011 at 8:37 AM, Gordon Sim <[email protected]> wrote: > On 12/07/2011 01:55 AM, Tom M wrote: > >> Hello, >> >> we are having a problem with our MRG (qpid) system: >> >> * when sending messages with size of 1600bytes, a connection (used for >> sending from client) does not detect the host connection is lost via >> heartbeat timeout. >> >> + we are using C++ qpid client 0.7 and qpidd 0.7 (linux 2.6 x86_64 on both >> client and broker hosts) >> >> and Ethernet connection (TCP/IP) between hosts >> >> + for this connection we have: ConnectionSettings >> connectionSettings.heartbeat = 8 >> >> + simulating a system failure by pulling the ethernet cable to the >> broker host >> >> + the connection close Exception is caught by the client after many >> minutes (6 to 20mins), I'm guessing this is due to the TCP timeout and not >> the missed heartbeats. >> >> + with the same exact application (for our client), if sending >> messages >> of 200bytes, we do get the qpid exception indicating the Connection closed >> (catch TransportFailure Exception: connection closed) within 16 seconds. >> For this testing, there were no other changes between the 2 cases, other >> than the size of the messages sent from the client (only expanded the size >> of the string in the body of the message) (1 message sent per second in >> both cases). >> >> * is this a known problem with qpid 0.7? >> > > No, i don't think this is a known issue. > > > * is there patch to fix this for qpid 0.7? >> >> * has this problem already been fixed in later releases? >> >> NOTE: we have already deployed qpid 0.7 in our system, and we will not be >> able to upgrade to a newer full release for many months. >> >> I'm wondering if the problem is that the connection gets blocked with the >> first TCP packet of a multiple packet message, such that the heartbeat >> detection is disabled until the full message is sent. But, if the >> multi-packet message can not complete (since socket is broken), the >> heartbeat logic is held disabled until the multi-packet message can >> complete (which in this case it can not). >> > > There is nothing that directly (intentionally) does anything like this. > However it may be possible that there is some deadlock or liveness issue > that prevents correct function in some cases. > > Is the test always failing with the larger message size? There is actually > no difference in the AMQP framing for a 200 byte v a 1600 byte message. It > may just be that the different timing of the larger write somehow triggers > the issue. > > Can you get trace level logs and a thread dump from the client for a > failed case? > > ------------------------------**------------------------------**--------- > Apache Qpid - AMQP Messaging Implementation > Project: http://qpid.apache.org > Use/Interact: > mailto:users-subscribe@qpid.**apache.org<[email protected]> > >
