I’ve been working 15 hour days for the last 2-3 weeks trying to resolve
this so if this is somewhat incoherent it’s probably due to lack of sleep
:-P

I think we’re experiencing a bug in ActiveMQ which is VERY hard to
reproduce but happens regularly in our production setup.

I can’t reproduce it in my test setup because it seems to require real
world data.  Every time I try to do so everything works fine.

It seems you have to have the following:

- a large number of queues which need servicing ( > 1000)
- a fairly large number of connections (>2000)
- message selectors
- a queue that has a large number of messages (5000).

I have my test code now reproducing it…

Everything works FINE if we have just a few message.  The problems arise
once the queue size grows at which point selectors don’t work.

It seems like *early* connections win.  If I create a connection to
ActiveMQ early, and keep it open, it will work. But new connections don’t
work..  Eventually, the existing connections will fail too.

Basically, it works JUST FINE without message selectors.

I KNOW it’s not my code because I’ve written a basic /simple consumer which
is literally just raw JMS and is < 50 lines of code.

I also know my messages selectors should match.  First.  they do match some
percentage of the time. Second, when I consume without the message
selectors, it works.  I have it print the message headers and I can confirm
that they should match.

This also seems to get worse over time.  The larger the queue, the less
chance messages will be serviced, eventually it will just lock up entirely.


There are no obvious errors in the ActiveMQ log.  Just regarding queue GC.

The box still has about 40% memory free.  So I don’t think it has any issue
with memory.  No OutOfMemoryErrors being logged.

I think another way to debug this could be to restart activemq itself with
message tracing. Then try to get the queue to this state again, and try to
consume messages nd see what’s being logged while it’s failing.

What’s frustrating here is that this is the 3rd ActiveMQ workaround I’ve
had to implement.

the first was because LevelDB was very slow… (artificially slow it seems),
so then I decided to just use the memory store.  But the memory store
doesn’t support priority, so instead, I implemented priority through JMS
selectors.  But now JMS selectors don’t work.

:-/

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>

Reply via email to