I've implemented a rough prototype of an optimization to the Java implementation that serializes Java strings once when optimize_for == SPEED, rather than twice. Brief overview:

* Add a private volatile byte[] cache for each string field.
* When computing the serialized size, serialize the string and store it in the cache. * When serializing, use the cache if available, then set the cache to null.

I used the ProtoBench.java program included in the SVN repository, using the messages included in the repository. Brief summary of results:

* Serializing a protocol buffer more than once is possibly slightly slower (~2-10%). I'm guessing the reason is that since it already has the message length, the extra conditionals and fields for string caching just get in the way. * Serializing a protocol buffer once, with the length prepended, is significantly faster (~33% for SpeedMessage1, with 10 strings, ~80% for SpeedMessage2, with lots of strings; the benchmark measures the time to create the protocol buffer, so the improvement is probably actually larger). * Classes are a bit bigger (SpeedMessage1.class with 8 strings: 24802 - > 25577)
* Size messages are unchanged.
* Messages without strings are unchanged.

* Faster serialization of messages that contain strings.

* More code (extra fields; conditional checking of the cache)
* More RAM (extra fields)
* Some changes to CodedOutputStream (more code to handle cached strings).

Does this seem like a worthwhile optimization? If so, I'll clean up my patch a bit and submit it for code review. Thanks,


Evan Jones

You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
For more options, visit this group at 

Reply via email to