I have done some work last week on optimizing the Codec.. and I think i've 
gotten some interesting results.


- The Decoder now is stateless, meaning the same instance can be used over and 
over (no more need for one instance per connection). Bozo Dragojefic has 
actually seen how heavy is to create a Decoder and has recently optimized 
MessageImpl to always take the same instance through ThreadLocals. This 
optimization goes a step further
- I have changed the ListDecoders somehow  you won't need intermediate objects 
to parse Types. For now I have only made Transfer as that effective type but I 
could do that for all the other Types at some point
- There were a few hotspots that I found on the test and I have refactored 
accordingly, meaning no semantic changes.

As a result of these optimizations, DecoderImpl won't have a setBuffer method 
any longer. Instead of that each method will take a read(ReadableBuffer..., old 
signature).


And talking about ReadableBuffer, I have introduced the interface 
ReadableBuffer. When integrating on the broker, I had a situation where I won't 
have a ByteBuffer, and this interface will allow me to further optimize the 
Parser later as I could take the usage of Netty Buffer (aka ByteBuf).


You will find these optimizations on my branch on github: 
https://github.com/clebertsuconic/qpid-proton/tree/optimizations


Where I will have two commits:

I - a micro benchmark where I added a testcase doing a direct read on the 
buffer without any framework. I've actually written a simple parser that will 
work for the byte array I have, but that's very close to reading directly from 
the bytes.
   I used that to compare raw reading and interpreting the buffer to the 
current framework we had.
   I was actually concerned about the number of intermediate objects, so I used 
that to map these differences.

https://github.com/clebertsuconic/qpid-proton/commit/7b2b02649e5bdd35aa2e4cc487ffb91c01e75685


I - a commit with the actual optimizations:


https://github.com/clebertsuconic/qpid-proton/commit/305ecc6aaa5192fc0a1ae42b90cb4eb8ddfe046e








Without these optimizations my MicroBenchmark, parsing 10000000L instances of 
Transfer, without reallocating any buffers could complete on my laptop in:

- 3480 milliseconds , against 750 milliseconds with raw reading


After these optimizations:
- 1927 milliseconds, against 750 milliseconds with raw reading



Notice that this will also minimize the footprint of the codec but I'm not 
measuring that here.





I'm looking forward to work with this group, I actually had a meeting with Rafi 
and Ted last week, and I plan to work closer to you guys on this



Clebert Suconic





Reply via email to