Hi,

are you able to reproduce your issue without Kubernetes layer on Artemis 
instances? I’m not sure how exactly do you kill a pod, but 40 second timeout 
very much looks like default pod grace timeout + 10 seconds.

--
    Vilius

From: John Lilley <john.lil...@redpointglobal.com.INVALID>
Sent: Wednesday, February 21, 2024 8:45 PM
To: users@activemq.apache.org
Cc: Lino Pereira <lino.pere...@redpointglobal.com>
Subject: HA failover: Nothing we try reduces client recovery below one minute

Greetings!

We are having a devil of a time trying to reduce delay during a failover event. 
 We’ve set our URL to
(tcp://dm-activemq-live-svc:61616,tcp://dm-activemq-backup-svc:61617)?ha=true&reconnectAttempts=200&initialConnectAttempts=200&clientFailureCheckPeriod=10000&connectionTTL=10000&callTimeout=10000

But nothing seems to reduce the time it takes for a sender to detect the issue 
and failover.  With or without these parameters, it takes a full minute to 
recover after failover. Are there other parameters to adjust?  Something in the 
broker?  Is there some internal retry loop and timer that waits longer that we 
can influence?

We are testing failover by killing one of the AMQ pods.  This almost always 
succeeds without issue, except occasionally it doesn’t.

The backup broker log shows that it assumes control very quickly
2024-02-16 22:42:56,523 INFO  [org.apache.activemq.artemis.core.server] 
AMQ221007: Server is now live
2024-02-16 22:42:56,533 INFO  [org.apache.activemq.artemis.core.server] 
AMQ221020: Started EPOLL Acceptor at 0.0.0.0:61617 for protocols 
[CORE,MQTT,AMQP,STOMP,HORNETQ,OPENWIRE]
2024-02-16 22:42:58,074 INFO  
[net.redpoint.rpdm.artemis_logger.RpdmArtemisLogger] SEND: HEADER= 
{"version":1,"type":"void","id":"jagbjffeu3of","api":"hpa_metrics","method":"get_expected_pod_counts","authorization":"Q979k6lzOu9KmqMA89GbvmZnIAMNpJJP/TEAAa7Yjpo="},
 BODY={"message_type":"void"}
2024-02-16 22:42:58,254 INFO  
[net.redpoint.rpdm.artemis_logger.RpdmArtemisLogger] DELIVER: HEADER= 
{"version":1,"type":"void","id":"jagbjffeu3of","api":"hpa_metrics","method":"get_expected_pod_counts","authorization":"Q979k6lzOu9KmqMA89GbvmZnIAMNpJJP/TEAAa7Yjpo="},
 BODY={"message_type":"void"}

In our app logs, we show that we are attempting to send a message to the AMQ 
broker at the time of failover:
2024-02-16T22:42:55.981 [http-nio-9910-exec-9] 
JmsRpcClientChannel.prepareCall:84 [9v2zwvclclrc] INFO - REQUEST OUT: { 
"header": 
{"version":1,"type":"get_task_status_request","id":"9v2zwvclclrc","api":"test_harness","method":"get_task_status","instance":"combined","authorization":"***REDACTED***"},
 
"body":{"id":"d1123865-ac47-4238-ab02-5f2324a43264","progress_start_index":0,"message_type":"get_task_status_request"}
 }

But then it takes 40 seconds for the “10000ms” timeout to happen.
2024-02-16T22:43:35.988 [http-nio-9910-exec-9] JmsProducerPool.send_:376 
[9v2zwvclclrc] WARN - Error sending message, will retry javax.jms.JMSException: 
AMQ219014: Timed out after waiting 10000 ms for response when sending packet 71

Our problem is really that this 40-second delay seems to push everything back 
so that recovery takes over a minute.  Once we get into the one-minute mark, we 
start hitting several timeouts, like our own RPC timeout setting and the nginx 
ingress controller for K8S.  How can we know why this takes so long?  We’ve set 
every timeout we can find for the AMQ client to 10 seconds.  Is there some 
other setting we need to adjust?


Meanwhile the primary broker pod has returned, and the backup decides it is not 
needed:
2024-02-16 22:43:16,115 INFO  [org.apache.activemq.artemis.core.server] 
AMQ221002: Apache ActiveMQ Artemis Message Broker version 2.31.2 
[10952195-b6ec-11ee-9c87-aa03cb64206a] stopped, uptime 16 minutes
2024-02-16 22:43:16,115 INFO  [org.apache.activemq.artemis.core.server] 
AMQ221039: Restarting as Replicating backup server after live restart

But our app is waiting for its response on the reply-to queue, and gets this 
error nearly a minute later:
2024-02-16T22:43:58.033 [Thread-6] JmsStaticConnectionPool.onException:78 [] 
ERROR - Receive error occurred. javax.jms.JMSException: 
ActiveMQDisconnectedException[errorType=DISCONNECTED message=AMQ219015: The 
connection was disconnected because of server shutdown]

This is the stack trace at time of the first timeout:
                at 
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:550)
                at 
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:452)
                at 
org.apache.activemq.artemis.core.protocol.core.impl.ActiveMQSessionContext.sendFullMessage(ActiveMQSessionContext.java:588)
                at 
org.apache.activemq.artemis.core.client.impl.ClientProducerImpl.sendRegularMessage(ClientProducerImpl.java:305)
                at 
org.apache.activemq.artemis.core.client.impl.ClientProducerImpl.doSend(ClientProducerImpl.java:277)
                at 
org.apache.activemq.artemis.core.client.impl.ClientProducerImpl.send(ClientProducerImpl.java:147)
                at 
org.apache.activemq.artemis.core.client.impl.ClientProducerImpl.send(ClientProducerImpl.java:129)
                at 
org.apache.activemq.artemis.jms.client.ActiveMQMessageProducer.doSendx(ActiveMQMessageProducer.java:483)
                at 
org.apache.activemq.artemis.jms.client.ActiveMQMessageProducer.send(ActiveMQMessageProducer.java:221)
                at 
net.redpoint.ipc.jms.JmsProducerPool.send_(JmsProducerPool.java:372)
                at 
net.redpoint.ipc.jms.JmsProducerPool.sendRequest(JmsProducerPool.java:301)
                at 
net.redpoint.ipc.jms.JmsRpcClientChannel.sendRequest(JmsRpcClientChannel.java:228)
                at 
net.redpoint.ipc.jms.JmsRpcClientChannel.invokeRaw(JmsRpcClientChannel.java:202)
                at 
net.redpoint.ipc.jms.JmsRpcClientChannel.call(JmsRpcClientChannel.java:101)
                at 
net.redpoint.ipc.clients.RpcClientBase._sync(RpcClientBase.java:169)
                at 
net.redpoint.ipc.clients.RpcClientBase._rpc(RpcClientBase.java:237)
                at 
net.redpoint.rpdm.ipc.clients.TestHarnessClient.getTaskStatus(TestHarnessClient.java:229)
                at 
net.redpoint.rpdm.ipc.web_service_gateway.TestHarnessHttpServer.lambda$getTaskStatus$25(TestHarnessHttpServer.java:390)
                at 
net.redpoint.ipc.SecurityControl.doAsNoThrow(SecurityControl.java:272)
                at 
net.redpoint.rpdm.ipc.web_service_gateway.TestHarnessHttpServer.lambda$getTaskStatus$26(TestHarnessHttpServer.java:390)
                at 
net.redpoint.ipc.InstanceControl.doAsNoThrow(InstanceControl.java:84)
                at 
net.redpoint.rpdm.ipc.web_service_gateway.TestHarnessHttpServer.getTaskStatus(TestHarnessHttpServer.java:390)
                at 
jdk.internal.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
                at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                at java.base/java.lang.reflect.Method.invoke(Method.java:568)
                at 
org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52)
                at 
org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:146)
                at 
org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:189)
                at 
org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:219)
                at 
org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:93)
                at 
org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:478)
                at 
org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:400)
                at 
org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:81)
                at 
org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:261)
                at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
                at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
                at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
                at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
                at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
                at 
org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265)
                at 
org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:240)
                at 
org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:697)
                at 
org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:394)
                at 
org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346)
                at 
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:357)
                at 
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:311)
                at 
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
                at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:205)
                at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149)
                at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:167)
                at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:90)
                at 
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:482)
                at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:115)
                at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:93)
                at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74)
                at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:340)
                at 
org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:391)
                at 
org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:63)
                at 
org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:896)
                at 
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1744)
                at 
org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:52)
                at 
org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191)
                at 
org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659)
                at 
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
                at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: ActiveMQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT 
message=AMQ219014: Timed out after waiting 10000 ms for response when sending 
packet 71]
                ... 65 more


This is the stack trace at the second timeout message
                at 
org.apache.activemq.artemis.jms.client.ActiveMQConnection$JMSFailureListener.connectionFailed(ActiveMQConnection.java:714)
                at 
org.apache.activemq.artemis.jms.client.ActiveMQConnection$JMSFailureListener.connectionFailed(ActiveMQConnection.java:735)
                at 
org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.callSessionFailureListeners(ClientSessionFactoryImpl.java:868)
                at 
org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.callSessionFailureListeners(ClientSessionFactoryImpl.java:856)
                at 
org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.failoverOrReconnect(ClientSessionFactoryImpl.java:802)
                at 
org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.handleConnectionFailure(ClientSessionFactoryImpl.java:566)
                at 
org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$DelegatingFailureListener.connectionFailed(ClientSessionFactoryImpl.java:1407)
                at 
org.apache.activemq.artemis.spi.core.protocol.AbstractRemotingConnection.callFailureListeners(AbstractRemotingConnection.java:98)
                at 
org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl.fail(RemotingConnectionImpl.java:212)
                at 
org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$CloseRunnable.run(ClientSessionFactoryImpl.java:1172)
                at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:57)
                at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:32)
                at 
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:68)
                at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
                at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
                at 
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
Caused by: ActiveMQDisconnectedException[errorType=DISCONNECTED 
message=AMQ219015: The connection was disconnected because of server shutdown]
                ... 7 more




[rg]<https://www.redpointglobal.com/>

John Lilley

Data Management Chief Architect, Redpoint Global Inc.

34 Washington Street, Suite 205 Wellesley Hills, MA 02481

M: +1 7209385761<tel:+1%207209385761> | 
john.lil...@redpointglobal.com<mailto:john.lil...@redpointglobal.com>

PLEASE NOTE: This e-mail from Redpoint Global Inc. (“Redpoint”) is confidential 
and is intended solely for the use of the individual(s) to whom it is 
addressed. If you believe you received this e-mail in error, please notify the 
sender immediately, delete the e-mail from your computer and do not copy, print 
or disclose it to anyone else. If you properly received this e-mail as a 
customer, partner or vendor of Redpoint, you should maintain its contents in 
confidence subject to the terms and conditions of your agreement(s) with 
Redpoint.

Reply via email to