timeout settings related question

Attila Wind Sun, 01 May 2022 09:10:25 -0700

Hi RMQ Users,

We are running a 3 node Rocket MQ Cluster - version 4.7.0, only masternodes.Our app language is Java and we are usingorg.apache.rocketmq:rocketmq-client:4.2.0

Recently we had an outage. One of the nodes went down due to hardwarefailure. The node was unavailable for 50 minutes.


What we noticed during this time was:

 * ~ 1/3 of the message producers started to wait 3 seconds - those
   ones who wanted to produce the message towards the dead node.
   Then they retried another node and the message was produced
   successfully.
 * The above behavior was in place for 15 minutes - after 15 minutes it
   looked no producer tried to send the message to the failed node anymore
 * After 50 minutes when the failed node returned this node immediately
   started to get messages again from the producers

So actually we realized there are 2 timeouts here.

The first, the 3 seconds timeout I believe we found it here:org.apache.rocketmq.client.producer.DefaultMQProducer.sendMsgTimeout

That's fine.

*But the 2nd, the 15 minutes timeout (when failed node is marked as deadeventually) we could not find anywhere...*We also tried to take a look into the RocketMQ Nameserver code becauseour idea was at the end it could be the Nameserver who marks that nodedead but no luck. :-(

Our goal would be to shorten this 15 minutes timeout if possible (giventhe 3rd observation from above that when the node came back it joinedthe cluster back seamlessly we believe something like 5 minutes would bemuch better for our App)

*Does anyone maybe know if changing this 15 minutes timeout is possibleand if yes then how/where?*


thanks!

--
Attila Wind

http://www.linkedin.com/in/attilaw <http://www.linkedin.com/in/attilaw>
Mobile: +49 176 43556932

timeout settings related question

Reply via email to