(inline)

On Tue, Apr 26, 2016 at 08:29:23PM -0600, Tim Bain wrote:
> Cross-datacenter connectivity is definitely possible.
> 
> You can configure ActiveMQ to discard messages once the dispatch queue is
> full, and you can specify strategies for deciding which messages to
> discard, but it appears that you're already using them so maybe you're
> looking for something different?

I'd like for the cluster to resume being clustered without manual intervention 
after network connectivity is restored. Everything else works well.

Also, update on the config, I tried out

<constantPendingMessageLimitStrategy limit="50"/>

on all brokers and that appears to have dealt with the error message at least. 
Still having the apparent clustering failure.

> Reconnecting automatically is the default behavior for networkConnectors
> using the static transport, but first your broker has to recognize that the
> remote broker is no longer available, and that doesn't happen
> instantaneously for network failures.  Till it does, the remote broker will
> be considered slow (if there are enough messages pending), and you'll see
> log lines like those, after which I'd expect you'll see lines about trying
> and failing to reconnect, until connectivity is restored and the logs go
> back to normal.  Is that the behavior you're seeing?

We had another cluster breakdown last night and I am seeing a successful 
reconnection attempt (pardon privacy munging).

North America, centre of the star (first log lines in several hours of the log):

INFO   | jvm 1    | 2016/04/26 19:41:40 |  WARN | Network connection between 
vm://mcomq3.me.com#20 and ssl:///1.2.3.4:52022 shutdown due to a remote error: 
org.apache.activemq.transport.InactivityIOException: Channel was inactive for 
too (>30000) long: tcp://1.2.3.4:52022
INFO   | jvm 1    | 2016/04/26 19:41:40 |  INFO | mcomq3.me.com bridge to 
mcomq4.me.eu stopped
INFO   | jvm 1    | 2016/04/26 19:41:41 |  INFO | Started responder end of 
duplex bridge 
mcomq4.me.eu-mcomq3.me.com-topics@ID:mcomq4-53890-1461690928760-0:1
INFO   | jvm 1    | 2016/04/26 19:41:41 |  INFO | Network connection between 
vm://mcomq3.me.com#40 and ssl:///1.2.3.4:52044 (mcomq4.me.eu) has been 
established.

Europe (also first log message in several hours):

INFO   | jvm 1    | 2016/04/26 23:41:40 |  WARN | Network connection between 
vm://mcomq4.me.eu#0 and ssl://mcomq3.me/5.6.7.8:61617 shutdown due to a remote 
error: java.io.EOFException
INFO   | jvm 1    | 2016/04/26 23:41:40 |  INFO | Establishing network 
connection from vm://mcomq4.me.eu?async=false to ssl://mcomq3.me:61617
INFO   | jvm 1    | 2016/04/26 23:41:40 |  INFO | mcomq4.me.eu bridge to 
mcomq3.me stopped
INFO   | jvm 1    | 2016/04/26 23:41:41 |  INFO | Network connection between 
vm://mcomq4.me.eu#8 and ssl://mcomq3.me/5.6.7.8:61617 (mcomq3.me) has been 
established.

However it turned out the clustering was flat on its face again.

Having read what you said about restarts I restarted the 3 brokers being the 
leaves (terminology?) of the star. I saw them connect in the log but there 
weren't messages being passed through by my mco client tests.

After that I restarted only the central broker and the other brokers 
reconnected and I had the cluster back. (Previously I would bring them all down 
and start them center-first which would bring the cluster back.)

So the actual reconnection behaviour is definitely working from the client 
side. I'm pondering if it's something about the central broker in the star that 
is having trouble after these disconnections. Any hints are very much welcome, 
this is almost certainly self-inflicted.

> And yes, in the case of a network partition, a broker in a given partition
> should be able to service the clients in that partition, though not
> necessarily the clients in another partition.
> 
> Tim
> On Apr 25, 2016 11:22 AM, "Christopher Wood" <christopher_w...@pobox.com>
> wrote:
> 
> > As background, there is an activemq cluster (5.13.2 on CentOS 6.7, star
> > topology) here to support mcollective. One datacenter is on the other side
> > of the Atlantic and every time inter-datacenter connectivity is interrupted
> > we see this prefetch log fragment and the clustering to that activemq
> > instance stops working.
> >
> > INFO   | jvm 1    | 2016/04/22 21:47:45 |  WARN | TopicSubscription:
> > consumer=mcomq4.me.eu->mcomq3.me.com-43531-1461361657724-32:1:1:1,
> > destinations=75, dispatched=1000, delivered=0, matched=1001, discarded=0:
> > has twice its prefetch limit pending, without an ack; it appears to be slow
> >
> > How would I get the activemq initiating the connection to stop clogging
> > like this and just try to reconnect periodically?
> >
> > Or is it even reasonable to cluster activemq between datacenters?
> >
> > More:
> >
> > I haven't found any activemq.xml setting which reads like "automatically
> > try to reconnect" or "just throw away older messages". There are
> > activemq.xml bits below.
> >
> > The clustering works well until that log line. The actual instance in
> > Europe and daemons connecting to it work just fine after the log line above
> > as long as I keep my requests local to that datacenter.
> >
> >
> > Bits from activemq.xml:
> >
> > <destinationPolicy>
> >   <policyMap>
> >     <policyEntries>
> >       <policyEntry topic=">" producerFlowControl="false"
> > usePrefetchExtension="false">
> >         <messageEvictionStrategy>
> >           <oldestMessageEvictionStrategy/>
> >         </messageEvictionStrategy>
> >         <pendingMessageLimitStrategy>
> >           <prefetchRatePendingMessageLimitStrategy multiplier="2"/>
> >         </pendingMessageLimitStrategy>
> >       </policyEntry>
> >       <policyEntry queue="*.reply.>" gcInactiveDestinations="true"
> > inactiveTimoutBeforeGC="300000" />
> >     </policyEntries>
> >   </policyMap>
> > </destinationPolicy>
> >
> > <networkConnectors>
> >   <networkConnector
> >       name="mcomq4.me.eu-mcomq3.me.com-topics"
> >       uri="static:(ssl://mcomq3.me.com:61617)"
> >       userName="amq"
> >       password="password"
> >       duplex="true"
> >       decreaseNetworkConsumerPriority="true"
> >       networkTTL="3"
> >       dynamicOnly="true">
> >     <excludedDestinations>
> >       <queue physicalName=">" />
> >     </excludedDestinations>
> >   </networkConnector>
> >   <networkConnector
> >       name="mcomq4.me.eu-mcomq3.me.com-queues"
> >       uri="static:(ssl://mcomq3.me.com:61617)"
> >       userName="amq"
> >       password="password"
> >       duplex="true"
> >       decreaseNetworkConsumerPriority="true"
> >       networkTTL="3"
> >       dynamicOnly="true"
> >       conduitSubscriptions="false">
> >     <excludedDestinations>
> >       <topic physicalName=">" />
> >     </excludedDestinations>
> >   </networkConnector>
> > </networkConnectors>
> >
> > <transportConnectors>
> >     <transportConnector name="stomp+nio+ssl" uri="stomp+ssl://
> > 0.0.0.0:61614?needClientAuth=true&amp;transport.enabledProtocols=TLSv1,TLSv1.1,TLSv1.2&amp;transport.hbGracePeriodMultiplier=5
> > "/>
> >     <transportConnector name="openwire+nio+ssl" uri="ssl://
> > 0.0.0.0:61617?needClientAuth=true&amp;transport.enabledProtocols=TLSv1,TLSv1.1,TLSv1.2
> > "/>
> > </transportConnectors>
> >
> >
> > Other things I've read to try and understand this:
> >
> >
> > https://access.redhat.com/documentation/en-US/Red_Hat_JBoss_A-MQ/6.0/html-single/Using_Networks_of_Brokers/index.html
> >
> >
> > https://access.redhat.com/documentation/en-US/Red_Hat_JBoss_A-MQ/6.0/html-single/Tuning_Guide/
> >
> > http://activemq.apache.org/slow-consumer-handling.html
> >
> >
> > My previous threads elsewhere, when I did not understand that it was a
> > specific network event causing clustering to break:
> >
> > https://groups.google.com/forum/#!topic/mcollective-users/MkHSVHt9uEI
> >
> > https://groups.google.com/forum/#!topic/mcollective-users/R2mEnuV5eK8
> >

Reply via email to