I've implemented a rough prototype of an optimization to the Java
implementation that serializes Java strings once when optimize_for ==
SPEED, rather than twice. Brief overview:
* Add a private volatile byte cache for each string field.
* When computing the serialized size, serialize the string and store
it in the cache.
* When serializing, use the cache if available, then set the cache to
I used the ProtoBench.java program included in the SVN repository,
using the messages included in the repository. Brief summary of results:
* Serializing a protocol buffer more than once is possibly slightly
slower (~2-10%). I'm guessing the reason is that since it already has
the message length, the extra conditionals and fields for string
caching just get in the way.
* Serializing a protocol buffer once, with the length prepended, is
significantly faster (~33% for SpeedMessage1, with 10 strings, ~80%
for SpeedMessage2, with lots of strings; the benchmark measures the
time to create the protocol buffer, so the improvement is probably
* Classes are a bit bigger (SpeedMessage1.class with 8 strings: 24802 -
* Size messages are unchanged.
* Messages without strings are unchanged.
* Faster serialization of messages that contain strings.
* More code (extra fields; conditional checking of the cache)
* More RAM (extra fields)
* Some changes to CodedOutputStream (more code to handle cached
Does this seem like a worthwhile optimization? If so, I'll clean up my
patch a bit and submit it for code review. Thanks,
You received this message because you are subscribed to the Google Groups
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to
For more options, visit this group at