Re: [protobuf] Re: Java implementation questions

2010-08-10 Thread Kenton Varda
Another big advantage of writing the size first is that you can potentially
write parsers which skip over sub-messages quickly, and perhaps parse them
lazily.  Currently the Google-provided implementations do not implement
this, but there are some third-party implementations that use lazy parsing.

On Thu, Aug 5, 2010 at 4:18 PM, Jason Hsueh jas...@google.com wrote:

 Groups are primarily deprecated because it used to be the case that groups
 could not be reused in other types. Now groups can be reused, but other
 messages must use the length-delimited format, rather than the group format.
  In theory, the fact that groups do not require a length prefix makes them
 more attractive. In practice, you typically end up computing message sizes
 during serialization anyway (see below), so there is no benefit to using
 groups. Messages are preferred stylistically: with a group you define a
 message type and a field at once. I'll admit there are applications where
 the group format is useful, specifically to stream a serialization by
 constructing the message on the fly. But internally, there doesn't seem to
 be a lot of demand for this.

 There are a number of performance optimizations (in C++, anyway) that
 depend on having the total size before the data is serialized. When writing
 to a string, having the size allows you to preallocate the array, avoiding
 multiple reallocs. For output to abstract streams, it is often the case that
 a message can fit into the buffer space available in the output, in which
 case the faster code path serializing to a flat array can be used. This also
 applies to embedded messages, so even if the parent message doesn't fit,
 messages farther down the structure can still use the faster path. These
 turned out to be pretty significant gains. (Again, C++ only. I don't know if
 there are similar benefits in Java.)

 On Thu, Aug 5, 2010 at 6:53 AM, Evan Jones ev...@mit.edu wrote:

 On Aug 5, 2010, at 9:16 , Ralf wrote:

 I might be mistaken, but didn't groups use this approach - use a
 special tag to indicate the end of a message? As only tags are
 checked, there is no need to escape any data.


 Good point, I forgot about groups. They definitely do use that approach.
 Maybe one of the Googlers on this list will have a better idea about why
 groups are now deprecated in favour of nested messages.



  Anyway, I was referring more to the implementation. For example, we
 could first serialize the message to a ByteArrayOutputStream, then
 write the result and its size to the output. Obviously this approach
 is much slower, but I was wondering if there were other similar
 approaches.



 Yes, you could do something like this. If you have some way to efficiently
 copy the bytes, it might be a win to use this approach and avoid computing
 sizes altogether.



 That's true, and would work. The other option would be to use fixed width
 integers for the lengths, so then you could reserve space in the buffer,
 serialize the message, then go back and fill in the length field. This would
 be an incompatible change to the serialization format, however.


 In fact, we used to use a similar approach where we would assume that sub
 messages are small. The code would leave a gap for the (varint-encoded) size
 and serialize the message, then go back and fill in the length. If the
 assumption was wrong, the data would have to be shifted. Using cached byte
 sizes turned out to be a win in most cases though, particularly in complex
 message structures.

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.


-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Re: Java implementation questions

2010-08-05 Thread Evan Jones

On Aug 5, 2010, at 9:16 , Ralf wrote:

I might be mistaken, but didn't groups use this approach - use a
special tag to indicate the end of a message? As only tags are
checked, there is no need to escape any data.


Good point, I forgot about groups. They definitely do use that  
approach. Maybe one of the Googlers on this list will have a better  
idea about why groups are now deprecated in favour of nested messages.




Anyway, I was referring more to the implementation. For example, we
could first serialize the message to a ByteArrayOutputStream, then
write the result and its size to the output. Obviously this approach
is much slower, but I was wondering if there were other similar
approaches.


That's true, and would work. The other option would be to use fixed  
width integers for the lengths, so then you could reserve space in  
the buffer, serialize the message, then go back and fill in the length  
field. This would be an incompatible change to the serialization  
format, however.


Evan

--
Evan Jones
http://evanjones.ca/

--
You received this message because you are subscribed to the Google Groups Protocol 
Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Re: Java implementation questions

2010-08-05 Thread Igor Gatis
Hey Ralf, just out of curiosity: what motived you to create your own j2me
port?

The reason I'm asking that is because I did wrote my one (
http://code.google.com/p/protobuf-j2me/)too but it was just due the fact, by
the time published mine, none of the available j2me implementations were
either complete, focused on code size nor 100% compatibility with all
protobuf 2.3.0 features. I'm just wondering what made you discard them too
(including mine).


On Thu, Aug 5, 2010 at 10:53 AM, Evan Jones ev...@mit.edu wrote:

 On Aug 5, 2010, at 9:16 , Ralf wrote:

 I might be mistaken, but didn't groups use this approach - use a
 special tag to indicate the end of a message? As only tags are
 checked, there is no need to escape any data.


 Good point, I forgot about groups. They definitely do use that approach.
 Maybe one of the Googlers on this list will have a better idea about why
 groups are now deprecated in favour of nested messages.



  Anyway, I was referring more to the implementation. For example, we
 could first serialize the message to a ByteArrayOutputStream, then
 write the result and its size to the output. Obviously this approach
 is much slower, but I was wondering if there were other similar
 approaches.


 That's true, and would work. The other option would be to use fixed width
 integers for the lengths, so then you could reserve space in the buffer,
 serialize the message, then go back and fill in the length field. This would
 be an incompatible change to the serialization format, however.


 Evan

 --
 Evan Jones
 http://evanjones.ca/

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Re: Java implementation questions

2010-08-05 Thread Ralf Kistner
Igor,

I started the port February this year, as far as I know only two other
ports existed - neither of met my requirements (nested messages).

I saw your port today for the first time. I still have to investigate
it, but I have a feeling the two ports have different priorities.

From inspection your port at http://code.google.com/p/protobuf-j2me/
aims to be feature-complete, and close to the official Java
implementation.

My port at http://github.com/ponderingpanda/protobuf-j2me started from
one of the other J2ME ports, and I added support for features I
needed. The big difference is that it aims to keep the generated code
as small as possible. I'm working on projects with more than 40
messages, so the generated code size is important to me. To achieve
this I sacrificed some of the features, such as immutable/ready-only
messages.

However, the port is slowly moving closer back to the official Java
port. Overall the basic runtime library (CodedInputStream, etc) is
very similar in our two ports and the official implementation. The
biggest difference is in the generated messages.

I'm open to the possibility of merging the two ports - users should
not have to choose between four J2ME ports. However, the fundamental
differences in the generated messages might make this unpractical.

Ralf



On Thu, Aug 5, 2010 at 8:27 PM, Igor Gatis igorga...@gmail.com wrote:
 Hey Ralf, just out of curiosity: what motived you to create your own j2me
 port?
 The reason I'm asking that is because I did wrote my one
 (http://code.google.com/p/protobuf-j2me/)too but it was just due the fact,
 by the time published mine, none of the available j2me implementations were
 either complete, focused on code size nor 100% compatibility with all
 protobuf 2.3.0 features. I'm just wondering what made you discard them too
 (including mine).

 On Thu, Aug 5, 2010 at 10:53 AM, Evan Jones ev...@mit.edu wrote:

 On Aug 5, 2010, at 9:16 , Ralf wrote:

 I might be mistaken, but didn't groups use this approach - use a
 special tag to indicate the end of a message? As only tags are
 checked, there is no need to escape any data.

 Good point, I forgot about groups. They definitely do use that approach.
 Maybe one of the Googlers on this list will have a better idea about why
 groups are now deprecated in favour of nested messages.


 Anyway, I was referring more to the implementation. For example, we
 could first serialize the message to a ByteArrayOutputStream, then
 write the result and its size to the output. Obviously this approach
 is much slower, but I was wondering if there were other similar
 approaches.

 That's true, and would work. The other option would be to use fixed width
 integers for the lengths, so then you could reserve space in the buffer,
 serialize the message, then go back and fill in the length field. This would
 be an incompatible change to the serialization format, however.

 Evan

 --
 Evan Jones
 http://evanjones.ca/

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.




-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Re: Java implementation questions

2010-08-05 Thread Igor Gatis
On Thu, Aug 5, 2010 at 4:09 PM, Ralf Kistner ralf.kist...@gmail.com wrote:

 Igor,

 I started the port February this year, as far as I know only two other
 ports existed - neither of met my requirements (nested messages).

 I saw your port today for the first time. I still have to investigate
 it, but I have a feeling the two ports have different priorities.

 From inspection your port at http://code.google.com/p/protobuf-j2me/
 aims to be feature-complete, and close to the official Java
 implementation.

 My port at http://github.com/ponderingpanda/protobuf-j2me started from
 one of the other J2ME ports, and I added support for features I
 needed. The big difference is that it aims to keep the generated code
 as small as possible. I'm working on projects with more than 40
 messages, so the generated code size is important to me. To achieve
 this I sacrificed some of the features, such as immutable/ready-only
 messages.


The big difference is that it aims to keep the generated code as small as
possible

This was my main motivation. :) My project generates code which is 65% to
70% smaller than j2se version (and this is before code optmizers like
proguard). Runtime library is as small as 22k. I've made similar sacrifices
to achieve that. Besides, I recently added a few advanced features such as
turnning builder pattern off and c-style enums (which uses bare integers
rather than classes).

If you want to give it a shot, here are a few instructions:
http://code.google.com/p/protobuf-j2me/wiki/LinuxBuildInstructions



 However, the port is slowly moving closer back to the official Java
 port. Overall the basic runtime library (CodedInputStream, etc) is
 very similar in our two ports and the official implementation. The
 biggest difference is in the generated messages.

 I'm open to the possibility of merging the two ports - users should
 not have to choose between four J2ME ports. However, the fundamental
 differences in the generated messages might make this unpractical.


Protobuf-j2me code is already being used by some big projects now. It can't
afford disruptive changes anymore. But projects maintainers do accept
patches for fixes and improvements if you want to contribute.



 Ralf



 On Thu, Aug 5, 2010 at 8:27 PM, Igor Gatis igorga...@gmail.com wrote:
  Hey Ralf, just out of curiosity: what motived you to create your own j2me
  port?
  The reason I'm asking that is because I did wrote my one
  (http://code.google.com/p/protobuf-j2me/)too but it was just due the
 fact,
  by the time published mine, none of the available j2me implementations
 were
  either complete, focused on code size nor 100% compatibility with all
  protobuf 2.3.0 features. I'm just wondering what made you discard them
 too
  (including mine).
 
  On Thu, Aug 5, 2010 at 10:53 AM, Evan Jones ev...@mit.edu wrote:
 
  On Aug 5, 2010, at 9:16 , Ralf wrote:
 
  I might be mistaken, but didn't groups use this approach - use a
  special tag to indicate the end of a message? As only tags are
  checked, there is no need to escape any data.
 
  Good point, I forgot about groups. They definitely do use that approach.
  Maybe one of the Googlers on this list will have a better idea about why
  groups are now deprecated in favour of nested messages.
 
 
  Anyway, I was referring more to the implementation. For example, we
  could first serialize the message to a ByteArrayOutputStream, then
  write the result and its size to the output. Obviously this approach
  is much slower, but I was wondering if there were other similar
  approaches.
 
  That's true, and would work. The other option would be to use fixed
 width
  integers for the lengths, so then you could reserve space in the
 buffer,
  serialize the message, then go back and fill in the length field. This
 would
  be an incompatible change to the serialization format, however.
 
  Evan
 
  --
  Evan Jones
  http://evanjones.ca/
 
  --
  You received this message because you are subscribed to the Google
 Groups
  Protocol Buffers group.
  To post to this group, send email to proto...@googlegroups.com.
  To unsubscribe from this group, send email to
  protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
  For more options, visit this group at
  http://groups.google.com/group/protobuf?hl=en.
 
 
 


-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Re: Java implementation questions

2010-08-05 Thread Jason Hsueh
Groups are primarily deprecated because it used to be the case that groups
could not be reused in other types. Now groups can be reused, but other
messages must use the length-delimited format, rather than the group format.
 In theory, the fact that groups do not require a length prefix makes them
more attractive. In practice, you typically end up computing message sizes
during serialization anyway (see below), so there is no benefit to using
groups. Messages are preferred stylistically: with a group you define a
message type and a field at once. I'll admit there are applications where
the group format is useful, specifically to stream a serialization by
constructing the message on the fly. But internally, there doesn't seem to
be a lot of demand for this.

There are a number of performance optimizations (in C++, anyway) that depend
on having the total size before the data is serialized. When writing to a
string, having the size allows you to preallocate the array, avoiding
multiple reallocs. For output to abstract streams, it is often the case that
a message can fit into the buffer space available in the output, in which
case the faster code path serializing to a flat array can be used. This also
applies to embedded messages, so even if the parent message doesn't fit,
messages farther down the structure can still use the faster path. These
turned out to be pretty significant gains. (Again, C++ only. I don't know if
there are similar benefits in Java.)

On Thu, Aug 5, 2010 at 6:53 AM, Evan Jones ev...@mit.edu wrote:

 On Aug 5, 2010, at 9:16 , Ralf wrote:

 I might be mistaken, but didn't groups use this approach - use a
 special tag to indicate the end of a message? As only tags are
 checked, there is no need to escape any data.


 Good point, I forgot about groups. They definitely do use that approach.
 Maybe one of the Googlers on this list will have a better idea about why
 groups are now deprecated in favour of nested messages.



  Anyway, I was referring more to the implementation. For example, we
 could first serialize the message to a ByteArrayOutputStream, then
 write the result and its size to the output. Obviously this approach
 is much slower, but I was wondering if there were other similar
 approaches.



Yes, you could do something like this. If you have some way to efficiently
copy the bytes, it might be a win to use this approach and avoid computing
sizes altogether.



 That's true, and would work. The other option would be to use fixed width
 integers for the lengths, so then you could reserve space in the buffer,
 serialize the message, then go back and fill in the length field. This would
 be an incompatible change to the serialization format, however.


In fact, we used to use a similar approach where we would assume that sub
messages are small. The code would leave a gap for the (varint-encoded) size
and serialize the message, then go back and fill in the length. If the
assumption was wrong, the data would have to be shifted. Using cached byte
sizes turned out to be a win in most cases though, particularly in complex
message structures.

-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.