Hi,

I work with Piotr on this issue. Let me try to provide some additional
information on our slow-down issue:

Storage is a PostgreSQL Server 9.3.2 on a Debian Wheezy / Kernel 3.2.51-1
System.

We use JDBC and the PGPoolingDataSource
(org.postgresql.ds.PGPoolingDataSource).

This is the persistenceAdapter configuration:
        <persistenceAdapter>
<jdbcPersistenceAdapter dataDirectory="activemq-data" dataSource="#postgres-ds" lockKeepAlivePeriod="0"
createTablesOnStartup="false" />
        </persistenceAdapter>

We have 2 destination interceptors setup. And we run the demo code
(jetty-demo) because we have some applications using the http/rest interface it provides. We don't run camel.

Other than that it's a pretty mondane setup. And we also run two instances at the same time as a sort of fail-over. Because of the jdbc-backend, only one of them is active, and we use the failover protocol on clientside to use the active one. We use haproxy to serve the webinterface from the active instance. Both activemq-instances run on the same linux box, with different service ip-adresses. (they use the same binaries, only configuration and data directory are separated). The reason we run two instances is that we had big stability issues before, with the activemq process sort-of-hanging itself up. We could move away from that setup, because with 5.10 this hasn't happened.

Like the database server, the linux box that runs the activemq instance is a Debian Wheezy Linux, but with Kernel 3.2.60-1+deb7u1.

Problem description: Once in a while we see 100% cpu load on the database.
We can isolate that to sql statements of the style:

SELECT ID, PRIORITY FROM ACTIVEMQ_MSGS WHERE MSGID_PROD='ID:tomcat10-XXX-41356-1422538681150-1:95156:1:1' AND MSGID_SEQ='1' AND CONTAINER='queue://XXX_export'

These sql statements take more than 500ms. We've had scenarios where they took more than 3 seconds to complete. Queuesize for 500ms was ~1200 messages for all queues (concentrated in one queue). With a production of about 2-3 Messages per seconds and a consumption of about 2 messages per second. Imho the queuesize and the query-time scales linearly.

We were able to "resolve" the issue by restarting both activemq instances. After that, the load on the database drops dramatically, instead of 100% cpu usage we see less than 10% on the database and a very fast recovery. The ActiveMQ-Processes look fine too.

My first quess was a missing database index, but they look fine. Besides, restarting the activemq instances resolves the issue .. which is very very weired for me .. I don't think it's a database lock either, because we couldn't see any and additionally, we see 100% cpu usage for the process executing the statement (postgres spawns a process per statement). That should imho (but I'm no database expect) not happen as well when there's a lock situation...

We're at a loss. Do you guys have an idea?

And one more thing: Once every two or three hours a lot of (several thousand) messages are created. But the above described problem is happening irregularly, every one or two weeks or so.

Best regards,
Mark

Reply via email to