I've been doing a bit more exploration around some of the codec strategies
I posted about a few weeks ago and I'd like to share some results.
There are still some gaps to fill in, but all the complex data types
(lists, maps, arrays, described types, etc) are dealt with for both
encode/decode. Those are important for evaluating performance as they are
the most complex to encode/decode and as such they significantly impact
You can look at what I've done here:
Note the codec2 package name is temporary, it's just so it could live
alongside the existing codec in the same codebase.
I've put together a basic benchmark to compare against the existing codec
The benchmark encodes and decodes a list of 10 integers and a UUID. My hope
is that this is a reasonable approximation of what is in a common frame,
e.g. a transfer or flow frame. So far the results are encouraging. On my
system the new codec is roughly 8 to 9 times faster than the existing codec
on encode, and about 5 times faster than the existing codec for decode:
[rhs@venture build]$ java -cp proton-j/proton-j.jar
org.apache.qpid.proton.codec2.Benchmark 100000000 all
new encode: 9270 millis
new decode: 7764 millis
existing encode: 78725 millis
existing decode: 40175 millis
The above Benchmark invocation is running through 100 million
encode/decodes and you can see the timing results for a typical run.
In addition to the raw performance considerations demonstrated by the
Benchmark, there are some interesting and potentially key aspects of the
design that would enable higher performance usage patterns.
The way the decoder works it scans the encoded byte stream and calls into
the data handler when types are encountered. The data handler is not
actually passed the decoded type, but instead it is passed a Decoder (which
is just a reference into the stream). The decoder can then be used by the
handler to extract the desired value from the data stream. This design
allows for a couple of nice things.
For one thing there is zero intermediate garbage created by the decoding
process itself, the only garbage produced is at the request of the handler,
e.g. if the handler wants a to extract a string as a full blown String
object it is free to do that and it will incur the associated overhead, but
the handler could also just choose to copy the utf8 bytes directly to some
final destination and avoid any conversion overhead. This also provides an
added measure of convenience and robustness since the ''type on the wire'
can be converted directly to the desired Java type, e.g. if it's an
integral type on the wire, your handler can just call getInt() or getLong()
and the decoder will convert/coerce automatically.
Another nice thing about this design is that there is minimal decode
overhead if the handler doesn't decode the type. This makes it possible to
quite efficiently scan for particular value(s) deep inside an encoded
stream. For example it should be possible to write a handler that extremely
efficiently evaluates a predicate against the message properties for things
like selectors/content based routing rules.
It should also be possible to write a handler that very efficiently copies
a data stream while modifying only a few values, e.g. copy a message from
an input buffer to an output buffer while updating just the ttl and
delivery count and adding some sort of trace header. We could even extend
the design to allow extremely efficient in-place modification of fixed
width values if we find that to be useful.
In addition to these lower level usage scenarios, it is also quite
straightforward to transform an encoded data stream into a full blown
object representation if performance is less critical. The codec includes a
POJOBuilder which implements the DataHandler interface and transforms an
AMQP byte stream into simple java objects. I've put together an example
I'd like to get people's feedback on these ideas. I would like this codec
layer to be usable/useful as a first class API in it's own right, and not
just an implementation detail of the protocol engine. If people are happy
with the design and the API, I think it would be a relatively
straightforward process to generate some DataHandler implementations from
the protocol XML that would effectively replace the existing codec layer in
the engine and hopefully provide a significant performance improvement as a