Hello,

    noone has any suggestion?

    Regards,

emmanuel

On 22/10/2018 16:04, Emmanuel Touzery wrote:
Hello,

    we have a tomee+ 7.0.3 installation with activemq, using kahadb as a persistent message storage. We have an activemq.xml, we plugged it though :

BrokerXmlConfig = xbean:file:/opt/app/tomee/conf/activemq.xml

    in the tomee.xml. The activemq broken runs within TOMEE:

ServerUrl       =  tcp://127.0.0.1:61616

    We have a prefetch of 2000:

<transportConnector name="nio" uri="nio://0.0.0.0:61616?jms.prefetchPolicy.all=2000"/>

    We use mKaha. We disabled flow control.

    So that everything would work, we had to add a couple of JARs in the TOMEE lib folder:

activemq-spring-5.14.3.jar
spring-beans-3.2.9.RELEASE.jar
spring-context-3.2.9.RELEASE.jar
spring-core-3.2.9.RELEASE.jar
spring-expression-3.2.9.RELEASE.jar
spring-web-3.2.9.RELEASE.jar
xbean-spring-3.9.jar

    We are "reading" from JMS through message-driven beans, implementing MessageListener and with @MessageDriven annotations.

    The application is pretty simple... Receive the data from HTTP/JSON, and store it to SQL (through hibernate).

    Everything works fine as long as the traffic is normal. However when there is a surge of incoming traffic, sometimes the JMS consumers stop getting called, and the queue only grows. The issue does not get fixed until TOMEE is restarted. And then we've seen the issue re-appear again maybe 40 minutes later. After a while, the server clears the queue and everything is fine again.

    We took a jstack thread dump of the application when it's in that "hung" state:
https://www.dropbox.com/s/p8wy7uz6inzsmlj/jstack.txt?dl=0

    What's interesting is that writes fall quite fast, and in steps, in general not all at once, but as well not slowly:
https://www.dropbox.com/s/nhm5s2zc7r9mk9z/graph_writes.png?dl=0

    After a restart things are fine again immediately.

    We're not sure what is the cause. From what we can tell from the thread dump, the consumers are idle, they just don't get notified that work is available. The server is certainly aware there are items in the queue, we monitor the queue through JMX and the queue size keeps growing during these episodes. We don't see anything out of the ordinary in the logs. We looked at thread IDs for consumers just before the issue, it doesn't look like the consumers get some deadlock one after the other for instance. It seems like a bunch of them are called in the last minute before the dropoff for instance. Also, during a blackout the JDBC pool usage is at 0 according to our JMX monitoring, so it doesn't seem to be about a deadlocked JDBC connection.

    We did notice the following activemq warnings in the log file, but the timestamps don't match with any particular events and from what we found out, they don't seem to be particularly worrying or likely to be related to the issue:

WARNING [ActiveMQ Journal Checkpoint Worker] org.apache.activemq.store.kahadb.MessageDatabase.getNextLocationForAckForward Failed to load next journal location: null

WARNING [ActiveMQ NIO Worker 6] org.apache.activemq.broker.TransportConnection.serviceTransportException Transport Connection to: tcp://127.0.0.1:37024 failed: java.io.EOFException

    Do you have any suggestion to try to fix this issue (which we sadly can't reproduce at will.. and it only happens pretty rarely)? Should we rather ask on the activemq mailing list?

    Regards,

emmanuel



Reply via email to