Hi, I have an issue with the Artemis broker which I am having troubles solving and also reproduce outside of my testing environment.
The setup is the following: 3 Artemis brokers running on separate servers, clustered in an Active-Active fashion with static connectors The clients are running JBoss 6 with the ActiveMQ 5 RA Messages are processed in XA transactions with MDBs All clients (16 of them, multiple queues each, no topics) use one separate RA (both old and new version tested) for each broker and use the failover protocol with prioritybackup=true and randomize=false, each RA connecting to server 1, 2 or 3 and are set to fail over to the next broker in line in the event of a broker becoming unavailable. This is done in order to achieve both Load balancing and redundancy. The environment is set up like this because it used to run with ActiveMQ 5 brokers as well, and this made sense at the time. The problem I am seeing with the Artemis brokers is that after a failover-failback scenario, so if a broker goes down and later comes back up, messages get stuck in the "Delivering" state and the only way to get them to roll back is to restart the broker. After a restart though, this problem persists, so the clients will "prefetch" up to their limit again and then stop. There is no timeout happening, messages stay like this forever and the only solution to this state is to either restart the clients or stop all Artemis brokers, start an ActiveMQ 5 broker for ~10 seconds and then start the Artemis brokers again. This happens on all broker restarts, but not to all clients at once, so I would guess this is some sort of a timing issue. I have tried changing every possible config I can think of without any effect and have yet to be able to reproduce this issue outside of this (legacy) test environment. I run Artemis in several other environments with newer clients (but who mostly run ActiveMQ5 clients but without JBoss, MDB and XA) and have zero issues. Some things I have noticed but have yet to piece together: The connectionID for the consumer that holds the messages "Delvering" does not exist, so in Hawtio I can trace the messages to a consumer, that consumer has a corresponding Session but the session does not have an associated connection. (there is a connectionID reported but if i click on it or search for it, it does not exist) The DeliveringCount goes to 1000 messages for each consumer, which is the Openwire default for prefetched messages, but most clients use prefetchPolicy.all=100, which is otherwise respected Artemis reports "Error during buffering operation", see attached file artemis_stacktrace.txt <http://activemq.2283324.n4.nabble.com/file/t378961/artemis_stacktrace.txt> A thread dump on the clients report that basically all JMS related threads are stuck at the same place, see attached file client_threads.txt <http://activemq.2283324.n4.nabble.com/file/t378961/client_threads.txt> Br, Anton -- Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html