I have done some work last week on optimizing the Codec.. and I think i've gotten some interesting results.
- The Decoder now is stateless, meaning the same instance can be used over and over (no more need for one instance per connection). Bozo Dragojefic has actually seen how heavy is to create a Decoder and has recently optimized MessageImpl to always take the same instance through ThreadLocals. This optimization goes a step further - I have changed the ListDecoders somehow you won't need intermediate objects to parse Types. For now I have only made Transfer as that effective type but I could do that for all the other Types at some point - There were a few hotspots that I found on the test and I have refactored accordingly, meaning no semantic changes. As a result of these optimizations, DecoderImpl won't have a setBuffer method any longer. Instead of that each method will take a read(ReadableBuffer..., old signature). And talking about ReadableBuffer, I have introduced the interface ReadableBuffer. When integrating on the broker, I had a situation where I won't have a ByteBuffer, and this interface will allow me to further optimize the Parser later as I could take the usage of Netty Buffer (aka ByteBuf). You will find these optimizations on my branch on github: https://github.com/clebertsuconic/qpid-proton/tree/optimizations Where I will have two commits: I - a micro benchmark where I added a testcase doing a direct read on the buffer without any framework. I've actually written a simple parser that will work for the byte array I have, but that's very close to reading directly from the bytes. I used that to compare raw reading and interpreting the buffer to the current framework we had. I was actually concerned about the number of intermediate objects, so I used that to map these differences. https://github.com/clebertsuconic/qpid-proton/commit/7b2b02649e5bdd35aa2e4cc487ffb91c01e75685 I - a commit with the actual optimizations: https://github.com/clebertsuconic/qpid-proton/commit/305ecc6aaa5192fc0a1ae42b90cb4eb8ddfe046e Without these optimizations my MicroBenchmark, parsing 10000000L instances of Transfer, without reallocating any buffers could complete on my laptop in: - 3480 milliseconds , against 750 milliseconds with raw reading After these optimizations: - 1927 milliseconds, against 750 milliseconds with raw reading Notice that this will also minimize the footprint of the codec but I'm not measuring that here. I'm looking forward to work with this group, I actually had a meeting with Rafi and Ted last week, and I plan to work closer to you guys on this Clebert Suconic