proposal to remove certain features from qpidd

Gordon Sim Thu, 19 Jul 2012 10:54:38 -0700

I have been looking at what would be required to get AMQP 1.0 supportalongside AMQP 0-10 support in the c++ broker, i.e. qpidd.

As part of that it became clear some refactoring of the broker codebasewould be required[1]. That in turn led me to believe that we shouldconsider dropping certain features. These would be dropped *after* thepending 0.18 release; i.e. they would still be present in 0.18, but thatwould be the last release in which they were present if my proposal wereaccepted.

The purpose of this mail is to list the features I would propose to dropand my reasons for doing so. For those who find it overly long, Iapologise and offer a very short summary at the end!

In each case the basic argument is that I believe the features are notvery well implemented and keeping them working as part of my refactoringwould take extra time that I would rather spend on achieving 1.0 supportmaking real improvements.

The first feature I propose we drop is the 'legacy' versions of LVQbehaviour. These forced a choice in the behaviour of the queue whenbrowsers (i.e. not destructive subscribers) received messages from it.The choice was to either have browsers miss updates, or to suppress thereplacing of one message by another with a matching key. This choice wasreally driven by a technical problem with the first implementation. Wehave since already moved to an improved implementation where thedistinction is not relevant. I see no good reason to keep the oldbehaviour any longer.

The second feature is the old async queue replication mechanism. This isvery fragile and I believe is no longer necessary given the new andimproved ha solution that first appeared in 0.16 and has been improvedsignificantly for 0.18.

The third feature is the 'last man standing' or 'cluster durable'option. The biggest reason for dropping this comes later(!), butconsidered on its own my concern is that there are no system level testsfor it so it is very hard to guarantee it still works without writingall those tests. I am entirely unconvinced by this solution, and thinkthat again the new HA mechanism would be a better way to achieve this(you could start up a backup node that forced all the replicatedmessages to disk). I am therefore keen to avoid wasting time and effort.

The fourth feature is - wait for it - the clustered broker capability asenabled by the cluster.so plugin. I believe this is nearing the end ofits life anyway. It is currently only available on linux with no realprospects of being ported to windows. The design as it turns out wasvery fragile to changes in the codebase and there are still somedifficult to solve bugs within it. A new HA mechanism has been developed(as alluded to above) and I believe that will replace the old cluster.The work needed to keep the cluster working through my refactor issizeable. It would in any case have the potential to destabilise thecluster (the aforementioned issue with fragility). This seems to me toargue strongly for dropping this in releases after 0.18, and for anyoneaffected, that would give them some time to try out the new HA and givefeedback as well.

The fifth and final feature I propose we drop is the confusingly named'flow to disk' feature. Now for this one I have no alternative to offeryet. The problem is supporting large queues whose aggregate size farexceeds a bounded amount of memory. I believe the current implementationis next to useless for the majority of cases as it keeps the headers ofall messages in memory. It is useless unless your messages are largeenough that the overhead keeping these headers in memory is outweighedby the size of the body (this overhead is significantly larger than thetransfer size of the headers). Further, since a common cause for largequeues is a short lived disparity between the rate of inflow andoutflow, the current solution can compound the problem by radicallyslowing down consumers even more. I believe there is a better solutionand I'm not convinced the current solution is worth the effort ofmaintaining any further. (I know Kim has been working on a new storeinterface and removing flow to disk would clean that up nicely as well!)

I hope this makes sense. I'm keen to get any thoughts or feedback onthese points. The purpose is not to deprive anyone of features they areusing but rather to spend time on more important work.


Summary:

features to drop are:

(i) legacy lvq modes; lvq support would still remain, only the two oldand peculiar modes would go; I really doubt anyone actually depends onthese anyway, they were more a limitation than a feature

(ii) asynchronous queue replication; solution is not mature enough forreal world use anyway due to fragility and inability to resync; new HAmechanism as introduced in 0.16 and improved on in 0.18 should addressthe need anyway.

(iii) clustering including last-man-standing mode; design is brittle andcurrently ties it to linux platform; new HA is the long term solutionhere anyway.


(iv) flow to disk; current solution really doesn't solve the problem anyway

--Gordon

[1] If you are interested at all, you kind find my latest patch and somenotes on the internal changes up on reviewboard:https://reviews.apache.org/r/5833/


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

proposal to remove certain features from qpidd

Reply via email to