Great stuff Jim! 2008/9/10 Jim Gomes <[EMAIL PROTECTED]>: > FYI, the NMS trunk now has the keep alive support implemented. You can turn > it on with the URI parameter "wireFormat.MaxInactivityDuration=nnnn" and > "wireFormat.MaxInactivityDurationInitialDelay=nnnn" where 'n' equals the > number of milliseconds. The initial delay option is optional and not > required to be used at the same time. It should operate just like the Java > client. I observed that the server will send a KeepAliveInfo command to the > client periodically. The client then responds back. This should keep the > socket connection alive even when no messages are flowing. I would be > willing to bet that this is what the two ActiveMQ servers are doing to each > other, which is why that solution worked for you. > > Best, > Jim > > On Tue, Sep 9, 2008 at 8:32 AM, Bryan Murphy <[EMAIL PROTECTED]> wrote: > >> We basically run a server here in our local office behind a firewall, and >> the rest of our stuff out on Amazon's EC2 cloud. We suspect there were >> issues with NAT timeouts and half dead TCP connections. >> The specific behaviors we saw using NMS manifested themselves in the >> following ways: >> >> 1. Client blocked on TCP connection waiting for messages, server does not >> think client is connected anymore. >> >> 2. Client blocked on TCP connection, server reports *multiple* listeners >> for >> a queue that should only have one listener (the number changes over time, >> tended to tick upwards, and then to downwards, probably after the server >> timed out a dead tcp connection, sometimes saw a listener count upwards of >> 9 >> or 10 when there should only be 1). >> >> 3. Clients do not appear to always re-establish connection to server once >> connection is dead. Frequently had to restart clients, occasionally had to >> restart server. >> >> 4. Message queues that were idle for long periods at a time exhibited >> problematic behavior. Messages queues that were active remained available >> (a huge indicator what was going on after fixing #5). >> >> 5. Hitting ^C to kill our application and not handling break to properly >> close connections caused behaviors very similar to what we were eventually >> seeing with our TCP connections. This, of course, made the issue that much >> more confusing and difficult to debug since not all communication problems >> were rooted at the network layer and the results were at least initially >> maddeningly inconsistent. >> >> We experimented with more aggressive request timeouts on the transport >> layer/session/connection (even modified the driver to ensure these were >> getting set), setting up static routes, opening up firewall ports and >> playing with the TCP timeouts (at least on our end, we have no control on >> the Amazon side). We tried prefetch size of one and tried to enable the >> keep alive but never figured out how to do it. The only solution that >> worked was the ActiveMQ to ActiveMQ bridge, and I suspect some of that may >> have to do with that we were never able to get keep alives working and we >> have no control over fine-grained NAT settings on the Amazon side. >> >> Bryan >> >> >> On Tue, Sep 9, 2008 at 10:09 AM, James Strachan <[EMAIL PROTECTED] >> >wrote: >> >> > Maybe the WAN is dropping connections; we have failover in Java; am >> > not sure we've added that to NMS yet have we? >> > >> > 2008/9/9 Jim Gomes <[EMAIL PROTECTED]>: >> > > Hi Bryan, >> > > That's interesting. I wonder where the problem is with ActiveMQ => NMS >> > > connection. Without knowing your exact network topology, I can't point >> > to >> > > where the problem is. All I can do is speak to my experience and I >> have >> > > been able to keep connections alive for a very long time without >> errors, >> > > both with high- and low-activity, even going over what my >> infrastructure >> > > team has told me is a WAN connection. >> > > >> > > Best, >> > > Jim >> > > >> > > On Tue, Sep 9, 2008 at 7:35 AM, Bryan Murphy <[EMAIL PROTECTED]> >> > wrote: >> > > >> > >> Thanks for the info. I suspected that's what the timeout meant, but >> you >> > >> never really know until you ask.. >> > >> Anyway, we finally solved our issue. We setup two instances of >> ActiveMQ >> > in >> > >> the two data centers to forward messages back and forth between each >> > other. >> > >> This is working much better for us. It seems the ActiveMQ to >> ActiveMQ >> > >> communication is a bit more robust than the ActiveMQ to Apache.NMS >> > >> communication (at least when running over a WAN). >> > >> >> > >> Bryan >> > >> >> > >> On Mon, Sep 8, 2008 at 2:49 PM, Jim Gomes <[EMAIL PROTECTED]> wrote: >> > >> >> > >> > Hi Bryan, >> > >> > I can't answer all of your questions, yet. But I can answer some of >> > >> them, >> > >> > anyway. >> > >> > >> > >> > 1. As far as the ResponseTimeout property goes, that is used for >> > network >> > >> > timeouts. It's not a JMS timeout value like TimeToLive. The >> > >> > ResponseTimeout is used by the client to wait for a response from >> the >> > >> > broker. Since a network call is inherently a blocking operation >> (send >> > >> > request, wait for response), if we never receive a response from a >> > >> > dead/hung >> > >> > broker, the client will hang as well. The ResponseTimeout lets >> client >> > >> > abort >> > >> > waiting for the response from the broker. This can be set to >> whatever >> > >> > performance constraints your application requires. In a WAN >> > environment, >> > >> > this might be set to something fairly high where there is a lot of >> > >> latency >> > >> > in network round-trips. The socket connection is not dropped. The >> > >> client >> > >> > simply stops waiting for the broker to respond and goes into its >> > >> > error-handling code for a non-response. >> > >> > >> > >> > 2. I see the marshalling code for the KeepAliveInfo, but like you I >> > don't >> > >> > see how this is turned on or controlled from the client-side. This >> > would >> > >> > need more investigation to see if it is enabled via a URI parameter, >> > or >> > >> if >> > >> > new code needs to be written to enable its use. >> > >> > >> > >> > 3. Can't answer the server-side socket issue. Don't know that code. >> > >> > >> > >> > >> > >> >> > > >> > >> > >> > >> > -- >> > James >> > ------- >> > http://macstrac.blogspot.com/ >> > >> > Open Source Integration >> > http://open.iona.com >> > >> >
-- James ------- http://macstrac.blogspot.com/ Open Source Integration http://open.iona.com