I've implemented a rough prototype of an optimization to the Java implementation that serializes Java strings once when optimize_for == SPEED, rather than twice. Brief overview:

* Add a private volatile byte[] cache for each string field.
* When computing the serialized size, serialize the string and store it in the cache. * When serializing, use the cache if available, then set the cache to null.

I used the ProtoBench.java program included in the SVN repository, using the messages included in the repository. Brief summary of results:

* Serializing a protocol buffer more than once is possibly slightly slower (~2-10%). I'm guessing the reason is that since it already has the message length, the extra conditionals and fields for string caching just get in the way. * Serializing a protocol buffer once, with the length prepended, is significantly faster (~33% for SpeedMessage1, with 10 strings, ~80% for SpeedMessage2, with lots of strings; the benchmark measures the time to create the protocol buffer, so the improvement is probably actually larger). * Classes are a bit bigger (SpeedMessage1.class with 8 strings: 24802 - > 25577)
* Size messages are unchanged.
* Messages without strings are unchanged.


Pros:
* Faster serialization of messages that contain strings.

Cons:
* More code (extra fields; conditional checking of the cache)
* More RAM (extra fields)
* Some changes to CodedOutputStream (more code to handle cached strings).

Does this seem like a worthwhile optimization? If so, I'll clean up my patch a bit and submit it for code review. Thanks,

Evan

--
Evan Jones
http://evanjones.ca/

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.


Reply via email to