Re: Advice for Scheduler refactor.

Kevin Burton Thu, 07 May 2015 11:59:51 -0700

On Thu, May 7, 2015 at 6:30 AM, Tim Bain <tb...@alumni.duke.edu> wrote:


> I agree with your approach with the WeakRunnable; I think that will achieve
> the goal without the performance hit of calling purge() after each
> cancellation.
>
>
I went ahead with this solution and it seems to be working well in
production.

Since the unit tests take 24 hours to run , and I haven’t actually been
able to get them to run, I gave up on that approach and I’m just being very
careful with my code and making sure we can revert our queue quickly.

So far my changes have been in production for a week and have been rock
solid.

These changes have dramatically improved the performance of ActiveMQ in
production for us.  The current version of ActiveMQ, without our patches,
simply can’t scale to this number of queues.

So far , with my changes, it’s been rock solid, doesn’t suffer the GC
lockup and GCs are not VERY fast.  They take 30-90 seconds now. Before they
were locking up ActiveMQ for 20 minutes at a time.  Once I finished with
the lock bugs then it was that queue GC was taking 100% CPU continually but
never actually finishing GC.

Now queue GC is much much faster and everything seems to be solid.


> If you take on the synchronized keywords, you should consider whether we're
> worried about concurrent calls to doStart() and doStop().  That scenario
> could leak a Timer, and the current code doesn't protect against that
> (you'd need a flag to say whether you've been stopped, and you'd need to
> check it in doStart() ).
>

Agreed. I decided to not take these on because they simply weren’t showing
up in the profile.  So if it aint broke -don’t fix it.

The only other issue, really, is the reflection cost of creating a queue.
But the performance there is linear at least.  The problem with the current
version of activeMQ is that performance falls exponentially because of
CopyOnWriteArrayList being used everywhere.  Refactoring those to use
ConcurrentHashMap seems to completely solve the problem.

It’s nice when a plan comes together.

These changes have actually allowed us to ship 5.0 version of our product.
I have to say that ActiveMQ slowed us to market by about 1-1.5 months… but
I think I was also asking ActiveMQ to do a lot.  But realistically, it
should have been able to keep up.  It’s just that internally it was using
the wrong data structures for our use case.

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>

Re: Advice for Scheduler refactor.

Reply via email to