DEBUG is a logging level in Log4J, which can be applied to loggers via
log4j.properties.  Setting your logging level to DEBUG prints more detailed
information to the logs, which is sometimes useful for figuring out what's
going on and sometimes not.

127.0.0.1 should work, but isn't your only choice.  localhost should
resolve to 127.0.0.1 and will behave equivalently.  Your machine also has a
hostname and one or more IP addresses, which clients on the same host
should be able to resolve; using one of those would allow the client's
configuration to not change if the client needs to be moved to another
host.  It might also have other FQDNs that can be resolved in DNS or
iptables and therefore allow a client to connect.  Ultimately the only
requirement is that the IP layer be able to deliver packets to the right
host and process.  I'd use the hostname because of the ability to move
clients around without reconfiguring them, but things should work any way
you do it.

What your log line from the second restart indicates to me is not that the
clients don't detect that the new process is back up, but rather that they
don't detect that it went away in the first place.  Have you confirmed
first and foremost that the previous process actually exited?  Second, is
the behavior different if the watchdog isn't running and doesn't
immediately restart the guard?  If so, how long does it take before the
clients detect that the broker isn't available and the failover logic (and
log lines) kick in?  And third, if you're hard-killing (kill -9) your guard
process, is the behavior different if you do a normal kill and allow the
process to exit gracefully?

I suspect that part of the cause here is TCP's inability to immediately
detect the closure of connections that are severed without going through
the TCP connection teardown logic, as happens when a process is kill
-9'ed.  It might even be that the clients who don't fail over simply can't
tell that there was ever a time when there was no process responding on
port 61616, and so the TCP layer perceives just a single connection when
you know it's really two.  If the TCP layer thinks that the connection
never closed, the failover logic will never kick in, which would explain
what you're seeing.

I'm not sure why the behavior would be different with an external broker
than with an embedded one, though it's possible that it's something about
startup speed, or an artifact of how you're performing your test.

Tim
It's very hard to DEBUG all these process because GUARD will restart the
process exit abnormally using JAR file.

Connection URI we use:
failover:(tcp://127.0.0.1:61616?wireFormat.maxInactivityDuration=0)
(It is said that the target IP must be 127.0.0.1 if PROCESSes and JMS SERVER
are running on the same host, IS THIS RIGHT?)

The first time GUARD is killed in STEP 2, LOG of all these four process
print:
2016-02-02 08:55:19,398 WARN
[org.apache.activemq.transport.failover.FailoverTransport] - Transport
(tcp://127.0.0.1:61616) failed, reason:  java.net.SocketException:
Connection reset, attempting to automatically reconnect
2016-02-02 08:55:23,692 INFO
[org.apache.activemq.transport.failover.FailoverTransport] - Successfully
reconnected to tcp://QH-20151209WEVY:61616

The second time GUARD is killed in STEP 4, LOG of abnormal PROCESSes print
message as usual:
2016-02-02 08:55:17,690 INFO [com.nm.server.comm.pm.ProcessManager] - send
alive msg to guard.
These abnormal PROCESSes haven't detected the ActiveMQ has been restarted.

Thanks for replying to me.




--
View this message in context:
http://activemq.2283324.n4.nabble.com/Connection-lost-when-ActiveMQ-shutdown-and-restart-twice-tp4706711p4706766.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Reply via email to