We have found the main cause for this issue. Current Master and new
Master seem to do the exact same things on startup.
They read all the journals.
It seems that the OS is caching the files in memory in some cases.
The master seems to have all the jorunal files in the cache. So when it
restarts it is fast.
The new master seems to not have the journals in cache so they have to
be transfered over the network (~10MB/s) in our case. So this
causes the difference in startup time.
One big question remains:
Why are the whole journals read? Shouldn't a master that is shut down in
a controlled way leave a clean state, so on startup only the meta data
index is read?
Christian
On 15.02.2016 13:32, Christian Schneider wrote:
At a customer we are experiencing a very strange behaviour of activemq.
We have two brokers on separate machines A and B. Storage is on a
third machine and imported using nfsv4.
Kahadb contains about 18 GB of journal files.
Scenario 1:
- Stop B
- Stop A (master now)
- Start A
The start of A takes about 50 seconds.
Scnario 2:
- Stop A
- Stop B (master now)
- Start A
The start of A takes about 5 minutes.
In the slow case the log at debug level shows a big gap between adding
the last queue (12:04) and running the ActiveMQ Journal Checkpoint
Worker again after the queue creation (12:09).
I have no idea what activemq is doing in the mean time but it seems to
take several minutes.
Can anyone explain this behaviour or explain what activemq does in the
mean time? I would also be happy about pointer to the code.
What I do not understand is why it makes a difference if the master is
started again or if the slave is becoming master. As they both share
the same kahadb over nfs I would expect the same startup
behaviour.
Christian
-----
Slow start
2016-02-15 12:03:01,943 | INFO | Refreshing
org.apache.activemq.xbean.XBeanBrokerFactory$1@17b14695: startup date
[Mon Feb 15 12:03:01 CET 2016]; root of context hierarchy |
org.apache.activemq.xbean.XBeanBrokerFactory$1 | main
...
2016-02-15 12:03:03,018 | INFO |
PListStore:[/path/activemq/data/A/tmp_storage] started |
org.apache.activemq.store.kahadb.plist.PListStoreImpl | main
2016-02-15 12:03:03,033 | INFO | ignoring zero length, partially
initialised journal data file: db-1.log number = 1 , length = 0 |
org.apache.activemq.store.kahadb.disk.journal.Journal | main
2016-02-15 12:03:03,177 | INFO | JobSchedulerStore:path/scheduler
started |
org.apache.activemq.store.kahadb.scheduler.JobSchedulerStoreImpl | main
2016-02-15 12:03:03,180 | INFO | JobScheduler using directory:
path/scheduler | org.apache.activemq.broker.BrokerService | main
2016-02-15 12:03:03,335 | INFO | Using Persistence Adapter:
KahaDBPersistenceAdapter[/path] |
org.apache.activemq.broker.BrokerService | main
2016-02-15 12:03:03,350 | INFO | JMX consoles can connect to
service:jmx:rmi:///jndi/rmi://localhost:1617/jmxrmi |
org.apache.activemq.broker.jmx.ManagementContext | JMX connector
2016-02-15 12:03:05,198 | INFO | Corrupt journal records found in
'path/db-1122.log' between offsets: 17893102..17893732 |
org.apache.activemq.store.kahadb.disk.journal.Journal | main
2016-02-15 12:03:47,925 | INFO | KahaDB is version 5 |
org.apache.activemq.store.kahadb.MessageDatabase | main
....
2016-02-15 12:03:49,233 | INFO | Recovering from the journal ... |
org.apache.activemq.store.kahadb.MessageDatabase | main
2016-02-15 12:03:49,237 | INFO | Recovery replayed 20 operations from
the journal in 0.094 seconds. |
org.apache.activemq.store.kahadb.MessageDatabase | main
2016-02-15 12:04:26,863 | DEBUG | Checkpoint started. |
org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
Checkpoint Worker
2016-02-15 12:04:26,932 | DEBUG | Checkpoint done. |
org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
Checkpoint Worker
2016-02-15 12:04:26,940 | INFO | Apache ActiveMQ 5.10.2 (A,
ID:A-59167-1455534183070-1:1) is starting |
org.apache.activemq.broker.BrokerService | main
...
2016-02-15 12:04:30,289 | DEBUG | A adding destination: queue://q1.DLQ
| org.apache.activemq.broker.region.AbstractRegion | main
2016-02-15 12:04:57,026 | DEBUG | queue://q1.DLQ expiring messages ..
| org.apache.activemq.broker.region.Queue | ActiveMQ Broker[A] Scheduler
2016-02-15 12:04:57,029 | DEBUG | q1.DLQ toPageIn: 5, Inflight: 0,
pagedInMessages.size 0, pagedInPendingDispatch.size 0, enqueueCount:
0, dequeueCount: 0, memUsage:0 |
org.apache.activemq.broker.region.Queue | ActiveMQ Broker[A] Scheduler
2016-02-15 12:09:20,440 | DEBUG | Checkpoint started. |
org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
Checkpoint Worker
2016-02-15 12:09:20,476 | DEBUG | Checkpoint done. |
org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
Checkpoint Worker
2016-02-15 12:09:20,488 | DEBUG | queue://q1.DLQ expiring messages
done. | org.apache.activemq.broker.region.Queue | ActiveMQ Broker[A]
Scheduler
--
Christian Schneider
http://www.liquid-reality.de
Open Source Architect
http://www.talend.com