Re: codec updates

Robbie Gemmell Tue, 27 May 2014 14:16:43 -0700

I haven't had time to run any of this or look through it properly yet, but
after a very very quick scroll through...


- Should we not use LinkedHashMap instances, due to the ordering
restriction on equality checks for amqp maps?
- It looked like it only uses list32, array32, string32 for encoding...do
you envisage supporting the others?
- Glancing at the example, this made me do a double take - is that an array
of string with a url in it, or somehow an array of url?
  (I guess I need to look at what the code actually does with this :P)

        enc.putArray(Type.STRING);
        enc.putDescriptor();
        enc.putSymbol("url");
        enc.putString("http://one";);
        enc.putString("http://two";);
        enc.putString("http://three";);
        enc.end();

- For the maps, it feels a little weird adding the keys and values
independently, though I guess thats to let us away without additional
map-specific methods?

        enc.putMap();
        enc.putString("key");
        enc.putString("value");
        enc.putString("pi");
        enc.putDouble(3.14159265359);
        enc.end();

- Out of interest, did you try larger heap sizes to see how much difference
(if any) there was in their relative performance?
- Well into personal preference terriroty, can't say I'm a huge fan of the
decoder converting nulls/booleans/etc into integers (albeit, if asked to).

Robbie


On 24 May 2014 16:49, Rafael Schloming <[email protected]> wrote:

> Hi Everyone,
>
> I've been doing a bit more exploration around some of the codec strategies
> I posted about a few weeks ago and I'd like to share some results.
>
> There are still some gaps to fill in, but all the complex data types
> (lists, maps, arrays, described types, etc) are dealt with for both
> encode/decode. Those are important for evaluating performance as they are
> the most complex to encode/decode and as such they significantly impact
> performance.
>
> You can look at what I've done here:
>
>   -
>
> https://github.com/rhs/qpid-proton/tree/codec/proton-j/src/main/java/org/apache/qpid/proton/codec2
>
> Note the codec2 package name is temporary, it's just so it could live
> alongside the existing codec in the same codebase.
>
> I've put together a basic benchmark to compare against the existing codec
> performance here:
>
>   -
>
> https://github.com/rhs/qpid-proton/blob/codec/proton-j/src/main/java/org/apache/qpid/proton/codec2/Benchmark.java
>
> The benchmark encodes and decodes a list of 10 integers and a UUID. My hope
> is that this is a reasonable approximation of what is in a common frame,
> e.g. a transfer or flow frame. So far the results are encouraging. On my
> system the new codec is roughly 8 to 9 times faster than the existing codec
> on encode, and about 5 times faster than the existing codec for decode:
>
>   [rhs@venture build]$ java -cp proton-j/proton-j.jar
> org.apache.qpid.proton.codec2.Benchmark 100000000 all
>   new encode: 9270 millis
>   new decode: 7764 millis
>   existing encode: 78725 millis
>   existing decode: 40175 millis
>
> The above Benchmark invocation is running through 100 million
> encode/decodes and you can see the timing results for a typical run.
>
> In addition to the raw performance considerations demonstrated by the
> Benchmark, there are some interesting and potentially key aspects of the
> design that would enable higher performance usage patterns.
>
> The way the decoder works it scans the encoded byte stream and calls into
> the data handler when types are encountered. The data handler is not
> actually passed the decoded type, but instead it is passed a Decoder (which
> is just a reference into the stream). The decoder can then be used by the
> handler to extract the desired value from the data stream. This design
> allows for a couple of nice things.
>
> For one thing there is zero intermediate garbage created by the decoding
> process itself, the only garbage produced is at the request of the handler,
> e.g. if the handler wants a to extract a string as a full blown String
> object it is free to do that and it will incur the associated overhead, but
> the handler could also just choose to copy the utf8 bytes directly to some
> final destination and avoid any conversion overhead. This also provides an
> added measure of convenience and robustness since the ''type on the wire'
> can be converted directly to the desired Java type, e.g. if it's an
> integral type on the wire, your handler can just call getInt() or getLong()
> and the decoder will convert/coerce automatically.
>
> Another nice thing about this design is that there is minimal decode
> overhead if the handler doesn't decode the type. This makes it possible to
> quite efficiently scan for particular value(s) deep inside an encoded
> stream. For example it should be possible to write a handler that extremely
> efficiently evaluates a predicate against the message properties for things
> like selectors/content based routing rules.
>
> It should also be possible to write a handler that very efficiently copies
> a data stream while modifying only a few values, e.g. copy a message from
> an input buffer to an output buffer while updating just the ttl and
> delivery count and adding some sort of trace header. We could even extend
> the design to allow extremely efficient in-place modification of fixed
> width values if we find that to be useful.
>
> In addition to these lower level usage scenarios, it is also quite
> straightforward to transform an encoded data stream into a full blown
> object representation if performance is less critical. The codec includes a
> POJOBuilder which implements the DataHandler interface and transforms an
> AMQP byte stream into simple java objects. I've put together an example
> usage here:
>
>   -
>
> https://github.com/rhs/qpid-proton/blob/codec/proton-j/src/main/java/org/apache/qpid/proton/codec2/Example.java
>
> I'd like to get people's feedback on these ideas. I would like this codec
> layer to be usable/useful as a first class API in it's own right, and not
> just an implementation detail of the protocol engine. If people are happy
> with the design and the API, I think it would be a relatively
> straightforward process to generate some DataHandler implementations from
> the protocol XML that would effectively replace the existing codec layer in
> the engine and hopefully provide a significant performance improvement as a
> result.
>
> --Rafael
>

Re: codec updates

Reply via email to