Having had to dig a bit more into the Java code base over the past few
weeks I've noticed some packaging/organizational issues. In particular I
found it difficult to figure out where to look for any given java file as
there doesn't seem to be much rhyme or reason to how classes are split up
between the proton and proton-api jars.

I'm going to pause at this point to address the inevitable flood of mockery
at the fact that I prefer emacs to using an IDE. ;-) I could certainly have
run etags on all of proton-j and lived in blissful ignorance of where files
are actually physically located, however I intentionally avoided that
because I wanted to develop an understanding of the thinking behind the
layout of the codebase. After spending a few weeks of doing this I'm
beginning to suspect there isn't as much thinking as I'd hoped. ;-)

So going back to what I've run into over the past few weeks...

The first thing I noticed was that there were overlapping packages between
proton-j, proton-api, and proton-logging. This is considered bad form by
some and is a non starter if you want to seal packages. I don't know how
much of a problem the package sealing thing is in practice, but in general
the overlap is just confusing for people who are trying to figure out what
stuff belongs where, whether you're a user trying to figure out what jar
you need to access a given class or whether you're a developer trying to
figure out where some new class belongs. This gets particularly confusing
where the proton-j impl adds its own interfaces into the same package that
the proton-api uses for its interfaces.

The second thing I noticed was that there were some classes in the pure
java impl that could actually be useful in the api jar. Specifically,
TransportFrame is a simple value object that ties together channel, body,
and payload. This is used inside a frame callback interface that exists in
the pure java impl, however its not available (and therefore not used)
inside the new logging interface that was recently added to the API.
Instead of using this, the channel, body, and payload are split out into
separate arguments, and the event firing code inside the transport is
duplicated.

The third thing I noticed was that there is a whole lot of stuff in the API
jar underneath the amqp package that doesn't seem to belong in the API jar,
for example all the frame bodies underneath transport are concrete classes.
This obviously would present at least a superficial problem if we were
trying to implement them via JNI.  Even if they were interfaces though I
don't think they would work as part of the common API since the C frame
parsing needs to be fairly different from the Java frame parsing.

As I understand it, the Java frame parsing works by transforming each
binary frame into a brand new Java object, then processing the Java object
and discarding it. This same design in C would not be workable as it would
put mallocs in the critical processing path. The C frame parsing therefore
needs to use a different design. It loads the binary frame into a frame
parser and then accesses/process the content of the frame and then
clears/reuses that same frame parser for the next frame. This means that
there are never any analogs to things like the Open frame created, and the
C implementation couldn't actually usefully support generating such a thing
even if it were an interface. I'd also note that I think it would be
reasonable for an alternative Java implementation to want to generate less
garbage and perhaps use a similar design to the C impl, so I don't think
this is really a C vs Java thing, I'd argue that those concrete classes are
really just part of the current Java impl and should not be part of a
properly abstracted API.

I ran across more issues, but to come back to the packaging/architecture
topic, I think these three pretty clearly illustrate that the current split
between api and impl suffers very much from having originally been all part
of a single impl rather than having been designed/coded with the separate
api/impl discipline in mind. I think this is also complicated by the fact
that the primary motivation for the two impls has really been to facilitate
testing rather than being driven by user requirements.

So the question is, how do we fix these issues going forward? At a minimum
I'd like to address the mechanical/organizational issues of overlapping
packages. Beyond that I think we need to be clear on the purpose of the
API/impl split. If we want the API to be meaningful for more than just
testing purposes, we actually need to put in the full amount of work
necessary to define and maintain a properly abstracted API that is capable
of having truly independent implementations, and as we do that we need to
consider what the real benefit is to users, e.g. why someone might choose
one implementation over another. Short of doing this I personally don't see
much value to the current ad-hoc impl/api split, so I'd like to either
improve it or simply merge the two back together as they really weren't
properly separated out in the first place and there is no reason they need
to be separate purely for testing purposes.

Thoughts?

(apologies for the long/rambling nature of this email)

--Rafael

Reply via email to