I've been wasting my time doing various microbenchmarks when I should
be doing "real" work. This message describes some "failed" attempted
optimizations, ideally so others don't waste their time.
I was looking at the Java CodedOutputStream implementation, and was
interested that it uses an internal byte array buffer, since this is
what BufferedOutputStream does. Additionally, the JVM internally uses
8192 as the "magic" buffer size inside BufferedOutputStream, and the
native code that actually writes data to/from files and sockets. I
tried two tweaks that are both worse than the existing code. I'm
reporting this here so others don't waste their time:
a) Change the default buffer size from 4096 to 8192 bytes.
b) Remove the internal buffer and rely on OutputStream.
System: Intel Xeon E5540 (Core i7/Nehalem) @ 2.53 GHz, Linux 2.6.29
Java: Both Sun 1.6.0_16-b01 and 1.7.0-ea-b74; 64-bit; always using -
Benchmark: Using ProtoBench, with my own extensions to write to /dev/
null using FileOutputStream, and BufferedOutputStream(FileOutputStream)
Summary of results:
a) Bigger buffer size: small messages are slightly slower, large
messages are slightly faster. The difference is ~1-2% at most, so this
could just be "noise." I also tried a 2048 byte buffer, and it also
makes approximately no difference.
b) Using OutputStream instead of internal buffer: For the small
message serializing to byte is slower, but serializing to /dev/null
is much faster (~ +30%). However, for the large message, it makes
everything a fair bit slower (at least 10% worse).
bonus) jdk7 has the same results, except it is generally faster than
* None of these optimizations is a clear win.
* 8192 is not always the right buffer size for Java (although it
should be a maximum for anything that might call
OutputStream.write()). I'm guessing the reason making the buffer
bigger hurts performance is due to the extra allocation/deallocation
cost for all the temporary CodedOutputStreams.
* Hotspot doesn't magically optimize as much as you might like: using
BufferedOutputStream does the same thing as CodedOutputStream's
internal byte buffer, but hotspot can't optimize the code as well.
I'm guessing this is because the dynamic dispatch on OutputStream
prevents aggressive inlining?
* Results are somewhat variable, and are of course data dependent.
More benchmarks should be done before making a performance related
You received this message because you are subscribed to the Google Groups
"Protocol Buffers" group.
To post to this group, send email to email@example.com
To unsubscribe from this group, send email to
For more options, visit this group at