Optimizations on Proton-j

Clebert Suconic Tue, 29 Apr 2014 06:28:38 -0700

I have done some work last week on optimizing the Codec.. and I think i've 
gotten some interesting results.

- The Decoder now is stateless, meaning the same instance can be used over and
over (no more need for one instance per connection). Bozo Dragojefic has
actually seen how heavy is to create a Decoder and has recently optimized
MessageImpl to always take the same instance through ThreadLocals. This
optimization goes a step further
- I have changed the ListDecoders somehow you won't need intermediate objects
to parse Types. For now I have only made Transfer as that effective type but I
could do that for all the other Types at some point
- There were a few hotspots that I found on the test and I have refactored
accordingly, meaning no semantic changes.

As a result of these optimizations, DecoderImpl won't have a setBuffer method
any longer. Instead of that each method will take a read(ReadableBuffer..., old
signature).

And talking about ReadableBuffer, I have introduced the interface
ReadableBuffer. When integrating on the broker, I had a situation where I won't
have a ByteBuffer, and this interface will allow me to further optimize the
Parser later as I could take the usage of Netty Buffer (aka ByteBuf).

You will find these optimizations on my branch on github:
https://github.com/clebertsuconic/qpid-proton/tree/optimizations

Where I will have two commits:

I - a micro benchmark where I added a testcase doing a direct read on the
buffer without any framework. I've actually written a simple parser that will
work for the byte array I have, but that's very close to reading directly from
the bytes.
I used that to compare raw reading and interpreting the buffer to the
current framework we had.
I was actually concerned about the number of intermediate objects, so I used
that to map these differences.

https://github.com/clebertsuconic/qpid-proton/commit/7b2b02649e5bdd35aa2e4cc487ffb91c01e75685

I - a commit with the actual optimizations:

https://github.com/clebertsuconic/qpid-proton/commit/305ecc6aaa5192fc0a1ae42b90cb4eb8ddfe046e

Without these optimizations my MicroBenchmark, parsing 10000000L instances of
Transfer, without reallocating any buffers could complete on my laptop in:

- 3480 milliseconds , against 750 milliseconds with raw reading

After these optimizations:
- 1927 milliseconds, against 750 milliseconds with raw reading

Notice that this will also minimize the footprint of the codec but I'm not
measuring that here.

I'm looking forward to work with this group, I actually had a meeting with Rafi
and Ted last week, and I plan to work closer to you guys on this

Clebert Suconic

Optimizations on Proton-j

Reply via email to