[protobuf] Optimizing protoc for Java

2012-11-28 Thread Ryan Fogarty
I have a repeated primitive field array optimization for the 
protoc-generated Java source, but before I discuss I would like to gauge 
interest (and get access to the Protocol Buffer Group).

Thanks,
Ryan

-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/protobuf/-/ym9XqRQ9tbMJ.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Optimizing protoc for Java

2012-11-28 Thread Christopher Smith
Interested.

--Chris

On Nov 19, 2012, at 4:07 AM, Ryan Fogarty ryan.foga...@gmail.com wrote:

 I have a repeated primitive field array optimization for the protoc-generated 
 Java source, but before I discuss I would like to gauge interest (and get 
 access to the Protocol Buffer Group).
 
 Thanks,
 Ryan
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 Protocol Buffers group.
 To view this discussion on the web visit 
 https://groups.google.com/d/msg/protobuf/-/ym9XqRQ9tbMJ.
 To post to this group, send email to protobuf@googlegroups.com.
 To unsubscribe from this group, send email to 
 protobuf+unsubscr...@googlegroups.com.
 For more options, visit this group at 
 http://groups.google.com/group/protobuf?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Optimizing protoc for Java

2012-11-28 Thread Ryan Fogarty
So to provide a little background, we are using Protobuf to push some 
fairly high throughput data and thus may be abusing its original design 
intent. We do try to keep packages under multi-megabytes by segmenting data 
packets. We are pushing this data from a C++ producer to a Java consumer. 
The C++ side has no problem handling and serializing the data. The Java 
side we found has the limitation that accessing the data through the 
accessor needlessly creates autoboxed object types that eventually slam our 
GC.

To fix the problem, I started out by generating Java stubs and then 
modyfying the problematic fields. However, I find that strategy is 
difficult to maintain with each new Protobuf version and it also was a bit 
precarious as this is a multi-organizational system and required too much 
expertise (and knowledge transfer). So I finally got around to hacking the 
protoc compiler itself to make a more efficient Java implementation. 

I have ported the 2.4.1 version of the protoc compiler to use a Java 
primitivetype[ ] as the storage for a repeated primitive fields as an 
alternative to the less efficient java.util.ListPrimitiveBoxedType. 
Additionally, it looks trivial to push this to the subversion trunk though 
it has a couple of added features over 2.4.1. asideI must compliment the 
authors of the protoc compiler as I must say that it was pretty straight 
forward to make this change./aside

The changes are entirely located in the file 
$PROTO_HOME/src/google/protobuf/compiler/java/java_primitive_field.cc

The change should improve performance for the (repeated) primitive accessor 
without changing its API:

primitivetype get$FIELDNAME$(int index)

but I have also extended the interface with a get..Array() function:

primitivetype[ ] get$FIELDNAME$Array();

Unfortunately, with the lack of const in Java this little addition actually 
punches a hole in the immutable pattern used in the Java API, but perhaps 
significantly for some will also allow use of the System.arraycopy 
function. I suppose I could have offered up my own version of an arraycopy 
as an alternative in lieu of the get...Array() function (kind of just 
thought of that). I.e. copy$FIELDNAME$Array(primitivetype[ ] target, int 
offset, int length). Anyway, I would welcome such suggestions.

I have uploaded to my Google Drive 
(https://docs.google.com/open?id=0B6kQ2S7zDGNaWl9OY041MGpwcHc) a rather raw 
patch for these changes (actually maintains all of the original macro lines 
commented out for reference). I hope others might find a use for this 
higher performance version of the Java interface. Ideally, I would love to 
see this integrated into the official version but there might be some edge 
cases that I am not considering. And there are some other design issues to 
consider such as the breaking immutability mentioned above as well as best 
strategies of growing a primitive array on the Builder side (I probably 
should just mimic the proportional ArrayList algorithm that it would use 
now).

Would love to get some feedback if folks might find this useful.

Thanks,
Ryan

P.S. Apologies for the long post.

-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/protobuf/-/cFbCMkknIHgJ.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.