So to provide a little background, we are using Protobuf to push some
fairly high throughput data and thus may be abusing its original design
intent. We do try to keep packages under multi-megabytes by segmenting data
packets. We are pushing this data from a C++ producer to a Java consumer.
The C++ side has no problem handling and serializing the data. The Java
side we found has the limitation that accessing the data through the
accessor needlessly creates autoboxed object types that eventually slam our
GC.
To fix the problem, I started out by generating Java stubs and then
modyfying the problematic fields. However, I find that strategy is
difficult to maintain with each new Protobuf version and it also was a bit
precarious as this is a multi-organizational system and required too much
expertise (and knowledge transfer). So I finally got around to hacking the
protoc compiler itself to make a more efficient Java implementation.
I have ported the 2.4.1 version of the protoc compiler to use a Java
primitivetype[ ] as the storage for a repeated primitive fields as an
alternative to the less efficient java.util.ListPrimitiveBoxedType.
Additionally, it looks trivial to push this to the subversion trunk though
it has a couple of added features over 2.4.1. asideI must compliment the
authors of the protoc compiler as I must say that it was pretty straight
forward to make this change./aside
The changes are entirely located in the file
$PROTO_HOME/src/google/protobuf/compiler/java/java_primitive_field.cc
The change should improve performance for the (repeated) primitive accessor
without changing its API:
primitivetype get$FIELDNAME$(int index)
but I have also extended the interface with a get..Array() function:
primitivetype[ ] get$FIELDNAME$Array();
Unfortunately, with the lack of const in Java this little addition actually
punches a hole in the immutable pattern used in the Java API, but perhaps
significantly for some will also allow use of the System.arraycopy
function. I suppose I could have offered up my own version of an arraycopy
as an alternative in lieu of the get...Array() function (kind of just
thought of that). I.e. copy$FIELDNAME$Array(primitivetype[ ] target, int
offset, int length). Anyway, I would welcome such suggestions.
I have uploaded to my Google Drive
(https://docs.google.com/open?id=0B6kQ2S7zDGNaWl9OY041MGpwcHc) a rather raw
patch for these changes (actually maintains all of the original macro lines
commented out for reference). I hope others might find a use for this
higher performance version of the Java interface. Ideally, I would love to
see this integrated into the official version but there might be some edge
cases that I am not considering. And there are some other design issues to
consider such as the breaking immutability mentioned above as well as best
strategies of growing a primitive array on the Builder side (I probably
should just mimic the proportional ArrayList algorithm that it would use
now).
Would love to get some feedback if folks might find this useful.
Thanks,
Ryan
P.S. Apologies for the long post.
--
You received this message because you are subscribed to the Google Groups
Protocol Buffers group.
To view this discussion on the web visit
https://groups.google.com/d/msg/protobuf/-/cFbCMkknIHgJ.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/protobuf?hl=en.