I have been looking at what would be required to get AMQP 1.0 support
alongside AMQP 0-10 support in the c++ broker, i.e. qpidd.
As part of that it became clear some refactoring of the broker codebase
would be required[1]. That in turn led me to believe that we should
consider dropping certain features. These would be dropped *after* the
pending 0.18 release; i.e. they would still be present in 0.18, but that
would be the last release in which they were present if my proposal were
accepted.
The purpose of this mail is to list the features I would propose to drop
and my reasons for doing so. For those who find it overly long, I
apologise and offer a very short summary at the end!
In each case the basic argument is that I believe the features are not
very well implemented and keeping them working as part of my refactoring
would take extra time that I would rather spend on achieving 1.0 support
making real improvements.
The first feature I propose we drop is the 'legacy' versions of LVQ
behaviour. These forced a choice in the behaviour of the queue when
browsers (i.e. not destructive subscribers) received messages from it.
The choice was to either have browsers miss updates, or to suppress the
replacing of one message by another with a matching key. This choice was
really driven by a technical problem with the first implementation. We
have since already moved to an improved implementation where the
distinction is not relevant. I see no good reason to keep the old
behaviour any longer.
The second feature is the old async queue replication mechanism. This is
very fragile and I believe is no longer necessary given the new and
improved ha solution that first appeared in 0.16 and has been improved
significantly for 0.18.
The third feature is the 'last man standing' or 'cluster durable'
option. The biggest reason for dropping this comes later(!), but
considered on its own my concern is that there are no system level tests
for it so it is very hard to guarantee it still works without writing
all those tests. I am entirely unconvinced by this solution, and think
that again the new HA mechanism would be a better way to achieve this
(you could start up a backup node that forced all the replicated
messages to disk). I am therefore keen to avoid wasting time and effort.
The fourth feature is - wait for it - the clustered broker capability as
enabled by the cluster.so plugin. I believe this is nearing the end of
its life anyway. It is currently only available on linux with no real
prospects of being ported to windows. The design as it turns out was
very fragile to changes in the codebase and there are still some
difficult to solve bugs within it. A new HA mechanism has been developed
(as alluded to above) and I believe that will replace the old cluster.
The work needed to keep the cluster working through my refactor is
sizeable. It would in any case have the potential to destabilise the
cluster (the aforementioned issue with fragility). This seems to me to
argue strongly for dropping this in releases after 0.18, and for anyone
affected, that would give them some time to try out the new HA and give
feedback as well.
The fifth and final feature I propose we drop is the confusingly named
'flow to disk' feature. Now for this one I have no alternative to offer
yet. The problem is supporting large queues whose aggregate size far
exceeds a bounded amount of memory. I believe the current implementation
is next to useless for the majority of cases as it keeps the headers of
all messages in memory. It is useless unless your messages are large
enough that the overhead keeping these headers in memory is outweighed
by the size of the body (this overhead is significantly larger than the
transfer size of the headers). Further, since a common cause for large
queues is a short lived disparity between the rate of inflow and
outflow, the current solution can compound the problem by radically
slowing down consumers even more. I believe there is a better solution
and I'm not convinced the current solution is worth the effort of
maintaining any further. (I know Kim has been working on a new store
interface and removing flow to disk would clean that up nicely as well!)
I hope this makes sense. I'm keen to get any thoughts or feedback on
these points. The purpose is not to deprive anyone of features they are
using but rather to spend time on more important work.
Summary:
features to drop are:
(i) legacy lvq modes; lvq support would still remain, only the two old
and peculiar modes would go; I really doubt anyone actually depends on
these anyway, they were more a limitation than a feature
(ii) asynchronous queue replication; solution is not mature enough for
real world use anyway due to fragility and inability to resync; new HA
mechanism as introduced in 0.16 and improved on in 0.18 should address
the need anyway.
(iii) clustering including last-man-standing mode; design is brittle and
currently ties it to linux platform; new HA is the long term solution
here anyway.
(iv) flow to disk; current solution really doesn't solve the problem anyway
--Gordon
[1] If you are interested at all, you kind find my latest patch and some
notes on the internal changes up on reviewboard:
https://reviews.apache.org/r/5833/
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]