Hi,
I work with Piotr on this issue. Let me try to provide some additional
information on our slow-down issue:
Storage is a PostgreSQL Server 9.3.2 on a Debian Wheezy / Kernel 3.2.51-1
System.
We use JDBC and the PGPoolingDataSource
(org.postgresql.ds.PGPoolingDataSource).
This is the persistenceAdapter configuration:
<persistenceAdapter>
<jdbcPersistenceAdapter dataDirectory="activemq-data"
dataSource="#postgres-ds" lockKeepAlivePeriod="0"
createTablesOnStartup="false" />
</persistenceAdapter>
We have 2 destination interceptors setup. And we run the demo code
(jetty-demo) because we have some applications using the http/rest
interface it provides. We don't run camel.
Other than that it's a pretty mondane setup. And we also run two
instances at the same time as a sort of fail-over. Because of the
jdbc-backend, only one of them is active, and we use the failover
protocol on clientside to use the active one. We use haproxy to serve
the webinterface from the active instance. Both activemq-instances run
on the same linux box, with different service ip-adresses. (they use the
same binaries, only configuration and data directory are separated). The
reason we run two instances is that we had big stability issues before,
with the activemq process sort-of-hanging
itself up. We could move away from that setup, because with 5.10 this
hasn't happened.
Like the database server, the linux box that runs the activemq instance
is a Debian Wheezy Linux, but with Kernel 3.2.60-1+deb7u1.
Problem description: Once in a while we see 100% cpu load on the database.
We can isolate that to sql statements of the style:
SELECT ID, PRIORITY FROM ACTIVEMQ_MSGS WHERE
MSGID_PROD='ID:tomcat10-XXX-41356-1422538681150-1:95156:1:1' AND
MSGID_SEQ='1' AND CONTAINER='queue://XXX_export'
These sql statements take more than 500ms. We've had scenarios where
they took more than 3 seconds to complete. Queuesize for 500ms was ~1200
messages for all queues (concentrated in one queue). With a production
of about 2-3 Messages per seconds and a consumption of about 2 messages
per second. Imho the queuesize and the query-time scales linearly.
We were able to "resolve" the issue by restarting both activemq
instances. After that, the load on the database drops dramatically,
instead of 100% cpu usage we see less than 10% on the database and a
very fast recovery. The ActiveMQ-Processes look fine too.
My first quess was a missing database index, but they look fine.
Besides, restarting the activemq instances resolves the issue .. which
is very very weired for me .. I don't think it's a database lock either,
because we couldn't see any and additionally, we see 100% cpu usage for
the process executing the statement (postgres spawns a process per
statement). That should imho (but I'm no database expect) not happen as
well when there's a lock situation...
We're at a loss. Do you guys have an idea?
And one more thing: Once every two or three hours a lot of (several
thousand) messages are created. But the above described problem is
happening irregularly, every one or two weeks or so.
Best regards,
Mark