I've been wasting my time doing various microbenchmarks when I should  
be doing "real" work. This message describes some "failed" attempted  
optimizations, ideally so others don't waste their time.

I was looking at the Java CodedOutputStream implementation, and was  
interested that it uses an internal byte[] array buffer, since this is  
what BufferedOutputStream does. Additionally, the JVM internally uses  
8192 as the "magic" buffer size inside BufferedOutputStream, and the  
native code that actually writes data to/from files and sockets. I  
tried two tweaks that are both worse than the existing code. I'm  
reporting this here so others don't waste their time:

a) Change the default buffer size from 4096 to 8192 bytes.
b) Remove the internal buffer and rely on OutputStream.

System: Intel Xeon E5540 (Core i7/Nehalem) @ 2.53 GHz, Linux 2.6.29
Java: Both Sun 1.6.0_16-b01 and 1.7.0-ea-b74; 64-bit; always using - 

Benchmark: Using ProtoBench, with my own extensions to write to /dev/ 
null using FileOutputStream, and BufferedOutputStream(FileOutputStream)

Summary of results:

a) Bigger buffer size: small messages are slightly slower, large  
messages are slightly faster. The difference is ~1-2% at most, so this  
could just be "noise." I also tried a 2048 byte buffer, and it also  
makes approximately no difference.

b) Using OutputStream instead of internal buffer: For the small  
message serializing to byte[] is slower, but serializing to /dev/null  
is much faster (~ +30%). However, for the large message, it makes  
everything a fair bit slower (at least 10% worse).

bonus) jdk7 has the same results, except it is generally faster than  


* None of these optimizations is a clear win.

* 8192 is not always the right buffer size for Java (although it  
should be a maximum for anything that might call  
OutputStream.write()). I'm guessing the reason making the buffer  
bigger hurts performance is due to the extra allocation/deallocation  
cost for all the temporary CodedOutputStreams.

* Hotspot doesn't magically optimize as much as you might like: using  
BufferedOutputStream does the same thing as CodedOutputStream's  
internal byte[] buffer, but hotspot can't optimize the code as well.  
I'm guessing this is because the dynamic dispatch on OutputStream  
prevents aggressive inlining?

* Results are somewhat variable, and are of course data dependent.  
More benchmarks should be done before making a performance related  
code change.


Evan Jones

You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
For more options, visit this group at 

Reply via email to