So if LevelDB cleanup during removeDestination() is the presumed culprit, can we spin off the LevelDB cleanup work into a separate thread (better: a task object to be serviced by a ThreadPool so you can avoid a fork bomb if we remove many destinations at once) so the call to removeDestination() can return quickly and LevelDB can do its record-keeping in the background without blocking message-processing?
I've spent precisely zero time looking at that code so I'm shooting from the hip, but it feels like we should be able to separate the action of removing the destination from the broker's in-memory list (along with the various record-keeping actions that might be required such as advisory messages) from the action of cleaning up the LevelDB message store as it relates to that destination; I would expect LevelDB should be capable of doing that cleanup even if it no longer existed in ActiveMQ's in-memory records, as long as the thread to do the LevelDB cleanup kept a reference to the destination object. But since that's not code I've ever looked at and I don't claim to know the details of what's being done there, maybe there's some race condition or edge case where that wouldn't work (e.g. if the destination is deleted but then re-created before the LevelDB task runs), so hopefully someone who knows that code better can shed light on any potential flaws. Incidentally, why would LevelDB have to do any non-trivial amount of work to delete a destination that contains no active messages? If that's not fast, I'm very curious why not. On Sun, Feb 22, 2015 at 4:03 PM, Kevin Burton <[email protected]> wrote: > On Sun, Feb 22, 2015 at 2:43 PM, Tim Bain <[email protected]> wrote: > > > I like the more-granular lock idea; that's better than holding the lock > > with brief unlocking periods as I proposed in the email I sent just now > to > > your other thread. In practice you'll need to do the refactoring I > > proposed in order to implement the algorithm you've described here, only > > you'll add a lookup for figuring out the lock for this queue. > > > > > I just responded to that email but I agree with you. > > > > I think this is much better than relying on the low max number of queues > > and trying to do just a few at a time. Do you know whether what's slow > is > > the call to canGC() or the call to removeDestination()? If it's canGC(), > > the do-it-often-but-only-for-a-few approach will kill you > performance-wise > > because you'll spend large amounts of time holding the lock while calling > > canGC on active destinations as you look for any that are inactive. So > for > > multiple reasons, don't do that. > > > > > I don’t know yet.. I suspect removeDestination because I always see it > there in the stack trace and I also suspect that 99.% of this is LevelDB. > It’s just that this global lock blocking everything means that ActiveMQ is > dead in the water while GC is happening. > > What I was going to do was look at installing a snapshot and then starting > to work with that snapshot for a while. I just built it with a tar.gz so I > can use that to push changes into production easily. > > I wanted to put a timer / stopwatch around the ENTIRE gc loop to measure > the total time it takes (we really need to trace the duration of that) and > also can add one around canGC and removeDestination. > > > > Definitely submit an enhancement request in JIRA, even if you're not sure > > you'll need it. Let's not lose track of the idea now that you've put all > > this time into digging into it. > > > > Sounds like a plan. I have a test project that I can use now to reference > and I’ll create a JIRA for this. I have to upgrade to 5.11 anyway so if I > can just take the current branch, build it, then create my .deb from it > then I can easily push test code into production to verify that we’ve > resolved this bug. > > Kevin > > -- > > Founder/CEO Spinn3r.com > Location: *San Francisco, CA* > blog: http://burtonator.wordpress.com > … or check out my Google+ profile > <https://plus.google.com/102718274791889610666/posts> > <http://spinn3r.com> >
