I've implemented a rough prototype of an optimization to the Java
implementation that serializes Java strings once when optimize_for ==
SPEED, rather than twice. Brief overview:
* Add a private volatile byte[] cache for each string field.
* When computing the serialized size, serialize the string and store
it in the cache.
* When serializing, use the cache if available, then set the cache to
null.
I used the ProtoBench.java program included in the SVN repository,
using the messages included in the repository. Brief summary of results:
* Serializing a protocol buffer more than once is possibly slightly
slower (~2-10%). I'm guessing the reason is that since it already has
the message length, the extra conditionals and fields for string
caching just get in the way.
* Serializing a protocol buffer once, with the length prepended, is
significantly faster (~33% for SpeedMessage1, with 10 strings, ~80%
for SpeedMessage2, with lots of strings; the benchmark measures the
time to create the protocol buffer, so the improvement is probably
actually larger).
* Classes are a bit bigger (SpeedMessage1.class with 8 strings: 24802 -
> 25577)
* Size messages are unchanged.
* Messages without strings are unchanged.
Pros:
* Faster serialization of messages that contain strings.
Cons:
* More code (extra fields; conditional checking of the cache)
* More RAM (extra fields)
* Some changes to CodedOutputStream (more code to handle cached
strings).
Does this seem like a worthwhile optimization? If so, I'll clean up my
patch a bit and submit it for code review. Thanks,
Evan
--
Evan Jones
http://evanjones.ca/
--
You received this message because you are subscribed to the Google Groups
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/protobuf?hl=en.