On Jan 10, 2018 12:42 PM, "neon18" <neo...@nngo.net> wrote:

We run the broker with max heap of 4G and initial of 1G (-Xmx4G -Xms1G).
We use non-persistent messages on these particular queues (3 of them in this
test).
The number of messages sent to the broker in my last "flood gate" test was
around 40,000 (40k) in 5 minutes or about 8K msgs/min. After this flood of
messages, the producers send messages at a much much lower rate. I have
pretty much the factory default activemq.xml with
systemUsage/memoryUsage/percentOfJvmHeap=70 and queuePrefetch=20 on these 3
queues.


Thanks for this info. 4GB max heap is fine, I wanted to make sure you
weren't running with something too small like 256MB.

8k msgs/min is 125/sec. How big are they? And what storage type (local
SCSI-attached SSD, NFS share, AWS EBS volume, whatever) are you using for
the data directory, and what is its throughput capacity? I was originally
thinking you filled 4GB of RAM in 6 seconds instead of 6 minutes, and I was
thinking that your storage solution might not be capable of accepting over
5Gbps. Even so, you describe running through 2GB of RAM in 20 seconds
(under the 50% scenario) and 3.2GB in 40 seconds (under the 20% scenario),
which both equate to over 500Mbps. So I'd like to pull this thread to make
sure your storage really can accept writes at the rate you're asking it to
do them.

So I have seen two different scenarios when lots of non-persistent messages
are put on queue.
1. Async Writer Thread Shutdown errors (with no prior warnings/errors), then
OutOfMemoryErrors
2. INFO PListStore: ...tmp_storage initialized, ~10 seconds later WARN:
IOException: OutOfMemoryError: GC overhead limit exceeded ... ActiveMQ
Transport: tcp:..., then that repeats and other errors follow. Also there is
no warnings/errors prior to the tmp_storage init info log msg. FYI: the web
console was responsive until I saw the tmp_storage initialized (KahaDB) INFO
msg (~4.5 minutes into my test), then it stops responding. The last count of
messages on queues via web console is ~30K msgs under ActiveMQ 5.15.2
broker. Under the 5.14.5 broker, I was able to see the flood of ~40K msgs
added to the 3 queues in ~6 minutes.

In more controlled testing in the past 2 days, where I clear the AMQ_DATA
dir before each test run, I have not seen issue #1 (Async Writer Thread
Shutdown / OutOfMemoryError). I see issue #2 (OutOfMemoryError) a few
seconds after KahaDB tmp_storage is initialized, then the web console stop
responding and lots of OoMerrors and other errors in the activemq.log.


For now, let's focus on issue #2 and hope that #1 was somehow a secondary
failure caused by #2 (so fixing #2 fixes #1). Once we have #2 fixed, we can
address #1 if it reoccurs.

Running with ActiveMQ 5.14.5 and 5.12.2 brokers, we do not have any
OutOfMemoryErrors with this same load or higher load vs running under
ActiveMQ 5.15.2. Running with 5.15.2 broker it seems like there might be an
issue with throttling the producers of the queue when the JVM hit's the
configured memoryUsage (default of 70%).

Running on that thought, I did another test with
systemUsage/memoryUsage/percentOfJvmHeap=50 %, but same thing (except that
the OoM error occurs about 20 seconds after the tmp_storage init info log.

So, I ran the test again with systemUsage/memoryUsage to 20%, same thing,
except the OoM error occurs about 40 seconds after the tmp_storage init info
log. This time, I also monitored the memory percent used and temp memory
percent used via the web console. right after I see the tmp_storage init
info log, I can see memUsed=39% tempUsed=1%, ~10 seconds later memUsed=56%
tempUsed=2%, ~10 seconds later memUsed=69% tempUsed=2%, next refresh failed
and of course in the activemq.log I see the OutOfMemoryErrors and other
warnings and errors appearing in the log.


If you send your messages slower but the same total number, does everything
work as expected?

Also, I grepped my old logs for "Journal failed" and did see some results,
but they happen after the a few OutOfMemoryErrors, so I did not include them
in this thread.


If the OOME is thrown from within that code, those could even be caused by
the OOME. As long as the OOME happened first, we can ignore them as you've
suggested. Just be aware that they indicate that the writer thread is dead
(and they caise it to kill itself, IIRC), so if they happen first then they
could explain why it's not running, so watch for that.

I can pretty reliably recreate the problem in about 6 minutes (with a clean
amq_data_dir) running ActiveMQ 5.15.2 broker and no issues under 5.14.5 or
5.12.2 brokers.


Is building the 5.15.3 branch from source to see if the fix Gary referred
an option? I'm not proposing you use that in operations, just that you try
it in a test environment to see if it fixes the problem.

Regards,

Neon



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html

Reply via email to