DEBUG is a logging level in Log4J, which can be applied to loggers via log4j.properties. Setting your logging level to DEBUG prints more detailed information to the logs, which is sometimes useful for figuring out what's going on and sometimes not.
127.0.0.1 should work, but isn't your only choice. localhost should resolve to 127.0.0.1 and will behave equivalently. Your machine also has a hostname and one or more IP addresses, which clients on the same host should be able to resolve; using one of those would allow the client's configuration to not change if the client needs to be moved to another host. It might also have other FQDNs that can be resolved in DNS or iptables and therefore allow a client to connect. Ultimately the only requirement is that the IP layer be able to deliver packets to the right host and process. I'd use the hostname because of the ability to move clients around without reconfiguring them, but things should work any way you do it. What your log line from the second restart indicates to me is not that the clients don't detect that the new process is back up, but rather that they don't detect that it went away in the first place. Have you confirmed first and foremost that the previous process actually exited? Second, is the behavior different if the watchdog isn't running and doesn't immediately restart the guard? If so, how long does it take before the clients detect that the broker isn't available and the failover logic (and log lines) kick in? And third, if you're hard-killing (kill -9) your guard process, is the behavior different if you do a normal kill and allow the process to exit gracefully? I suspect that part of the cause here is TCP's inability to immediately detect the closure of connections that are severed without going through the TCP connection teardown logic, as happens when a process is kill -9'ed. It might even be that the clients who don't fail over simply can't tell that there was ever a time when there was no process responding on port 61616, and so the TCP layer perceives just a single connection when you know it's really two. If the TCP layer thinks that the connection never closed, the failover logic will never kick in, which would explain what you're seeing. I'm not sure why the behavior would be different with an external broker than with an embedded one, though it's possible that it's something about startup speed, or an artifact of how you're performing your test. Tim It's very hard to DEBUG all these process because GUARD will restart the process exit abnormally using JAR file. Connection URI we use: failover:(tcp://127.0.0.1:61616?wireFormat.maxInactivityDuration=0) (It is said that the target IP must be 127.0.0.1 if PROCESSes and JMS SERVER are running on the same host, IS THIS RIGHT?) The first time GUARD is killed in STEP 2, LOG of all these four process print: 2016-02-02 08:55:19,398 WARN [org.apache.activemq.transport.failover.FailoverTransport] - Transport (tcp://127.0.0.1:61616) failed, reason: java.net.SocketException: Connection reset, attempting to automatically reconnect 2016-02-02 08:55:23,692 INFO [org.apache.activemq.transport.failover.FailoverTransport] - Successfully reconnected to tcp://QH-20151209WEVY:61616 The second time GUARD is killed in STEP 4, LOG of abnormal PROCESSes print message as usual: 2016-02-02 08:55:17,690 INFO [com.nm.server.comm.pm.ProcessManager] - send alive msg to guard. These abnormal PROCESSes haven't detected the ActiveMQ has been restarted. Thanks for replying to me. -- View this message in context: http://activemq.2283324.n4.nabble.com/Connection-lost-when-ActiveMQ-shutdown-and-restart-twice-tp4706711p4706766.html Sent from the ActiveMQ - User mailing list archive at Nabble.com.