Great stuff Jim!

2008/9/10 Jim Gomes <[EMAIL PROTECTED]>:
> FYI, the NMS trunk now has the keep alive support implemented.  You can turn
> it on with the URI parameter "wireFormat.MaxInactivityDuration=nnnn" and
> "wireFormat.MaxInactivityDurationInitialDelay=nnnn" where 'n' equals the
> number of milliseconds.  The initial delay option is optional and not
> required to be used at the same time.  It should operate just like the Java
> client.  I observed that the server will send a KeepAliveInfo command to the
> client periodically.  The client then responds back.  This should keep the
> socket connection alive even when no messages are flowing.  I would be
> willing to bet that this is what the two ActiveMQ servers are doing to each
> other, which is why that solution worked for you.
>
> Best,
> Jim
>
> On Tue, Sep 9, 2008 at 8:32 AM, Bryan Murphy <[EMAIL PROTECTED]> wrote:
>
>> We basically run a server here in our local office behind a firewall, and
>> the rest of our stuff out on Amazon's EC2 cloud.  We suspect there were
>> issues with NAT timeouts and half dead TCP connections.
>> The specific behaviors we saw using NMS manifested themselves in the
>> following ways:
>>
>> 1. Client blocked on TCP connection waiting for messages, server does not
>> think client is connected anymore.
>>
>> 2. Client blocked on TCP connection, server reports *multiple* listeners
>> for
>> a queue that should only have one listener (the number changes over time,
>> tended to tick upwards, and then to downwards, probably after the server
>> timed out a dead tcp connection, sometimes saw a listener count upwards of
>> 9
>> or 10 when there should only be 1).
>>
>> 3. Clients do not appear to always re-establish connection to server once
>> connection is dead.  Frequently had to restart clients, occasionally had to
>> restart server.
>>
>> 4. Message queues that were idle for long periods at a time exhibited
>> problematic behavior.  Messages queues that were active remained available
>> (a huge indicator what was going on after fixing #5).
>>
>> 5. Hitting ^C to kill our application and not handling break to properly
>> close connections caused behaviors very similar to what we were eventually
>> seeing with our TCP connections.  This, of course, made the issue that much
>> more confusing and difficult to debug since not all communication problems
>> were rooted at the network layer and the results were at least initially
>> maddeningly inconsistent.
>>
>> We experimented with more aggressive request timeouts on the transport
>> layer/session/connection (even modified the driver to ensure these were
>> getting set), setting up static routes, opening up firewall ports and
>> playing with the TCP timeouts (at least on our end, we have no control on
>> the Amazon side).  We tried prefetch size of one and tried to enable the
>> keep alive but never figured out how to do it.  The only solution that
>> worked was the ActiveMQ to ActiveMQ bridge, and I suspect some of that may
>> have to do with that we were never able to get keep alives working and we
>> have no control over fine-grained NAT settings on the Amazon side.
>>
>> Bryan
>>
>>
>> On Tue, Sep 9, 2008 at 10:09 AM, James Strachan <[EMAIL PROTECTED]
>> >wrote:
>>
>> > Maybe the WAN is dropping connections; we have failover in Java; am
>> > not sure we've added that to NMS yet have we?
>> >
>> > 2008/9/9 Jim Gomes <[EMAIL PROTECTED]>:
>> > > Hi Bryan,
>> > > That's interesting.  I wonder where the problem is with ActiveMQ => NMS
>> > > connection.  Without knowing your exact network topology, I can't point
>> > to
>> > > where the problem is.  All I can do is speak to my experience and I
>> have
>> > > been able to keep connections alive for a very long time without
>> errors,
>> > > both with high- and low-activity, even going over what my
>> infrastructure
>> > > team has told me is a WAN connection.
>> > >
>> > > Best,
>> > > Jim
>> > >
>> > > On Tue, Sep 9, 2008 at 7:35 AM, Bryan Murphy <[EMAIL PROTECTED]>
>> > wrote:
>> > >
>> > >> Thanks for the info.  I suspected that's what the timeout meant, but
>> you
>> > >> never really know until you ask..
>> > >> Anyway, we finally solved our issue.  We setup two instances of
>> ActiveMQ
>> > in
>> > >> the two data centers to forward messages back and forth between each
>> > other.
>> > >>  This is working much better for us.  It seems the ActiveMQ to
>> ActiveMQ
>> > >> communication is a bit more robust than the ActiveMQ to Apache.NMS
>> > >> communication (at least when running over a WAN).
>> > >>
>> > >> Bryan
>> > >>
>> > >> On Mon, Sep 8, 2008 at 2:49 PM, Jim Gomes <[EMAIL PROTECTED]> wrote:
>> > >>
>> > >> > Hi Bryan,
>> > >> > I can't answer all of your questions, yet.  But I can answer some of
>> > >> them,
>> > >> > anyway.
>> > >> >
>> > >> > 1. As far as the ResponseTimeout property goes, that is used for
>> > network
>> > >> > timeouts.  It's not a JMS timeout value like TimeToLive.  The
>> > >> > ResponseTimeout is used by the client to wait for a response from
>> the
>> > >> > broker.  Since a network call is inherently a blocking operation
>> (send
>> > >> > request, wait for response), if we never receive a response from a
>> > >> > dead/hung
>> > >> > broker, the client will hang as well.  The ResponseTimeout lets
>> client
>> > >> > abort
>> > >> > waiting for the response from the broker.  This can be set to
>> whatever
>> > >> > performance constraints your application requires.  In a WAN
>> > environment,
>> > >> > this might be set to something fairly high where there is a lot of
>> > >> latency
>> > >> > in network round-trips.  The socket connection is not dropped.  The
>> > >> client
>> > >> > simply stops waiting for the broker to respond and goes into its
>> > >> > error-handling code for a non-response.
>> > >> >
>> > >> > 2. I see the marshalling code for the KeepAliveInfo, but like you I
>> > don't
>> > >> > see how this is turned on or controlled from the client-side.  This
>> > would
>> > >> > need more investigation to see if it is enabled via a URI parameter,
>> > or
>> > >> if
>> > >> > new code needs to be written to enable its use.
>> > >> >
>> > >> > 3. Can't answer the server-side socket issue.  Don't know that code.
>> > >> >
>> > >> >
>> > >>
>> > >
>> >
>> >
>> >
>> > --
>> > James
>> > -------
>> > http://macstrac.blogspot.com/
>> >
>> > Open Source Integration
>> > http://open.iona.com
>> >
>>
>



-- 
James
-------
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com

Reply via email to