RabbitMQ queue consumption issue

Rene Cordier Sun, 05 Nov 2023 19:34:40 -0800

Hello guys!

Maybe some people in the community had similar issues than some of usbefore for a while with RabbitMQ when using the distributed-app, that itwould stop sometimes consuming a queue, where we would just restartJames manually.

We encountered the case with the TaskManagerWorkQueue when running aheavy tasks on it taking few hours and having other tasks coming andpiling up in the queue, waiting to be consumed. We could observe thenthat James would nack the messages in the queue (telling RabbitMQ itwill process them later, ideally after finishing its current task). Butthen after 30 minutes, we could see James stopping consuming the queueand all items going back to the ready state.

The reason is that RabbitMQ has a timeout on consuming items, as asafety measure. If the consumer fails to ack a message within a certaintime (30 minutes by default), then it closes the channel with a`PRECONDITION_FAILED` channel exception :https://www.rabbitmq.com/consumers.html#acknowledgement-timeout

From there we think that sometimes James could also fails for somereason to ack properly a message, then loosing consuming on that queue,like we had in the past with the mail queue.

From there, we can take action, like doing a reconnection when wedetect such issue on the channel.

More details in the JIRA ticket:https://issues.apache.org/jira/browse/JAMES-3955

Benoit seems to have taken a shot at resume consuming on queues loosingthem as well, if some people want to check it out:https://github.com/apache/james-project/pull/1778

If there is some RabbitMQ experts as well in the community that havebetter ideas or other suggestions, don't hesitate !


Thanks and cheers guys,

Rene.

RabbitMQ queue consumption issue

Reply via email to