Re: [protobuf] Re: Java implementation questions
Another big advantage of writing the size first is that you can potentially write parsers which skip over sub-messages quickly, and perhaps parse them lazily. Currently the Google-provided implementations do not implement this, but there are some third-party implementations that use lazy parsing. On Thu, Aug 5, 2010 at 4:18 PM, Jason Hsueh jas...@google.com wrote: Groups are primarily deprecated because it used to be the case that groups could not be reused in other types. Now groups can be reused, but other messages must use the length-delimited format, rather than the group format. In theory, the fact that groups do not require a length prefix makes them more attractive. In practice, you typically end up computing message sizes during serialization anyway (see below), so there is no benefit to using groups. Messages are preferred stylistically: with a group you define a message type and a field at once. I'll admit there are applications where the group format is useful, specifically to stream a serialization by constructing the message on the fly. But internally, there doesn't seem to be a lot of demand for this. There are a number of performance optimizations (in C++, anyway) that depend on having the total size before the data is serialized. When writing to a string, having the size allows you to preallocate the array, avoiding multiple reallocs. For output to abstract streams, it is often the case that a message can fit into the buffer space available in the output, in which case the faster code path serializing to a flat array can be used. This also applies to embedded messages, so even if the parent message doesn't fit, messages farther down the structure can still use the faster path. These turned out to be pretty significant gains. (Again, C++ only. I don't know if there are similar benefits in Java.) On Thu, Aug 5, 2010 at 6:53 AM, Evan Jones ev...@mit.edu wrote: On Aug 5, 2010, at 9:16 , Ralf wrote: I might be mistaken, but didn't groups use this approach - use a special tag to indicate the end of a message? As only tags are checked, there is no need to escape any data. Good point, I forgot about groups. They definitely do use that approach. Maybe one of the Googlers on this list will have a better idea about why groups are now deprecated in favour of nested messages. Anyway, I was referring more to the implementation. For example, we could first serialize the message to a ByteArrayOutputStream, then write the result and its size to the output. Obviously this approach is much slower, but I was wondering if there were other similar approaches. Yes, you could do something like this. If you have some way to efficiently copy the bytes, it might be a win to use this approach and avoid computing sizes altogether. That's true, and would work. The other option would be to use fixed width integers for the lengths, so then you could reserve space in the buffer, serialize the message, then go back and fill in the length field. This would be an incompatible change to the serialization format, however. In fact, we used to use a similar approach where we would assume that sub messages are small. The code would leave a gap for the (varint-encoded) size and serialize the message, then go back and fill in the length. If the assumption was wrong, the data would have to be shifted. Using cached byte sizes turned out to be a win in most cases though, particularly in complex message structures. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com . For more options, visit this group at http://groups.google.com/group/protobuf?hl=en. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] Re: Java implementation questions
On Aug 5, 2010, at 9:16 , Ralf wrote: I might be mistaken, but didn't groups use this approach - use a special tag to indicate the end of a message? As only tags are checked, there is no need to escape any data. Good point, I forgot about groups. They definitely do use that approach. Maybe one of the Googlers on this list will have a better idea about why groups are now deprecated in favour of nested messages. Anyway, I was referring more to the implementation. For example, we could first serialize the message to a ByteArrayOutputStream, then write the result and its size to the output. Obviously this approach is much slower, but I was wondering if there were other similar approaches. That's true, and would work. The other option would be to use fixed width integers for the lengths, so then you could reserve space in the buffer, serialize the message, then go back and fill in the length field. This would be an incompatible change to the serialization format, however. Evan -- Evan Jones http://evanjones.ca/ -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] Re: Java implementation questions
Hey Ralf, just out of curiosity: what motived you to create your own j2me port? The reason I'm asking that is because I did wrote my one ( http://code.google.com/p/protobuf-j2me/)too but it was just due the fact, by the time published mine, none of the available j2me implementations were either complete, focused on code size nor 100% compatibility with all protobuf 2.3.0 features. I'm just wondering what made you discard them too (including mine). On Thu, Aug 5, 2010 at 10:53 AM, Evan Jones ev...@mit.edu wrote: On Aug 5, 2010, at 9:16 , Ralf wrote: I might be mistaken, but didn't groups use this approach - use a special tag to indicate the end of a message? As only tags are checked, there is no need to escape any data. Good point, I forgot about groups. They definitely do use that approach. Maybe one of the Googlers on this list will have a better idea about why groups are now deprecated in favour of nested messages. Anyway, I was referring more to the implementation. For example, we could first serialize the message to a ByteArrayOutputStream, then write the result and its size to the output. Obviously this approach is much slower, but I was wondering if there were other similar approaches. That's true, and would work. The other option would be to use fixed width integers for the lengths, so then you could reserve space in the buffer, serialize the message, then go back and fill in the length field. This would be an incompatible change to the serialization format, however. Evan -- Evan Jones http://evanjones.ca/ -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com . For more options, visit this group at http://groups.google.com/group/protobuf?hl=en. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] Re: Java implementation questions
Igor, I started the port February this year, as far as I know only two other ports existed - neither of met my requirements (nested messages). I saw your port today for the first time. I still have to investigate it, but I have a feeling the two ports have different priorities. From inspection your port at http://code.google.com/p/protobuf-j2me/ aims to be feature-complete, and close to the official Java implementation. My port at http://github.com/ponderingpanda/protobuf-j2me started from one of the other J2ME ports, and I added support for features I needed. The big difference is that it aims to keep the generated code as small as possible. I'm working on projects with more than 40 messages, so the generated code size is important to me. To achieve this I sacrificed some of the features, such as immutable/ready-only messages. However, the port is slowly moving closer back to the official Java port. Overall the basic runtime library (CodedInputStream, etc) is very similar in our two ports and the official implementation. The biggest difference is in the generated messages. I'm open to the possibility of merging the two ports - users should not have to choose between four J2ME ports. However, the fundamental differences in the generated messages might make this unpractical. Ralf On Thu, Aug 5, 2010 at 8:27 PM, Igor Gatis igorga...@gmail.com wrote: Hey Ralf, just out of curiosity: what motived you to create your own j2me port? The reason I'm asking that is because I did wrote my one (http://code.google.com/p/protobuf-j2me/)too but it was just due the fact, by the time published mine, none of the available j2me implementations were either complete, focused on code size nor 100% compatibility with all protobuf 2.3.0 features. I'm just wondering what made you discard them too (including mine). On Thu, Aug 5, 2010 at 10:53 AM, Evan Jones ev...@mit.edu wrote: On Aug 5, 2010, at 9:16 , Ralf wrote: I might be mistaken, but didn't groups use this approach - use a special tag to indicate the end of a message? As only tags are checked, there is no need to escape any data. Good point, I forgot about groups. They definitely do use that approach. Maybe one of the Googlers on this list will have a better idea about why groups are now deprecated in favour of nested messages. Anyway, I was referring more to the implementation. For example, we could first serialize the message to a ByteArrayOutputStream, then write the result and its size to the output. Obviously this approach is much slower, but I was wondering if there were other similar approaches. That's true, and would work. The other option would be to use fixed width integers for the lengths, so then you could reserve space in the buffer, serialize the message, then go back and fill in the length field. This would be an incompatible change to the serialization format, however. Evan -- Evan Jones http://evanjones.ca/ -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] Re: Java implementation questions
On Thu, Aug 5, 2010 at 4:09 PM, Ralf Kistner ralf.kist...@gmail.com wrote: Igor, I started the port February this year, as far as I know only two other ports existed - neither of met my requirements (nested messages). I saw your port today for the first time. I still have to investigate it, but I have a feeling the two ports have different priorities. From inspection your port at http://code.google.com/p/protobuf-j2me/ aims to be feature-complete, and close to the official Java implementation. My port at http://github.com/ponderingpanda/protobuf-j2me started from one of the other J2ME ports, and I added support for features I needed. The big difference is that it aims to keep the generated code as small as possible. I'm working on projects with more than 40 messages, so the generated code size is important to me. To achieve this I sacrificed some of the features, such as immutable/ready-only messages. The big difference is that it aims to keep the generated code as small as possible This was my main motivation. :) My project generates code which is 65% to 70% smaller than j2se version (and this is before code optmizers like proguard). Runtime library is as small as 22k. I've made similar sacrifices to achieve that. Besides, I recently added a few advanced features such as turnning builder pattern off and c-style enums (which uses bare integers rather than classes). If you want to give it a shot, here are a few instructions: http://code.google.com/p/protobuf-j2me/wiki/LinuxBuildInstructions However, the port is slowly moving closer back to the official Java port. Overall the basic runtime library (CodedInputStream, etc) is very similar in our two ports and the official implementation. The biggest difference is in the generated messages. I'm open to the possibility of merging the two ports - users should not have to choose between four J2ME ports. However, the fundamental differences in the generated messages might make this unpractical. Protobuf-j2me code is already being used by some big projects now. It can't afford disruptive changes anymore. But projects maintainers do accept patches for fixes and improvements if you want to contribute. Ralf On Thu, Aug 5, 2010 at 8:27 PM, Igor Gatis igorga...@gmail.com wrote: Hey Ralf, just out of curiosity: what motived you to create your own j2me port? The reason I'm asking that is because I did wrote my one (http://code.google.com/p/protobuf-j2me/)too but it was just due the fact, by the time published mine, none of the available j2me implementations were either complete, focused on code size nor 100% compatibility with all protobuf 2.3.0 features. I'm just wondering what made you discard them too (including mine). On Thu, Aug 5, 2010 at 10:53 AM, Evan Jones ev...@mit.edu wrote: On Aug 5, 2010, at 9:16 , Ralf wrote: I might be mistaken, but didn't groups use this approach - use a special tag to indicate the end of a message? As only tags are checked, there is no need to escape any data. Good point, I forgot about groups. They definitely do use that approach. Maybe one of the Googlers on this list will have a better idea about why groups are now deprecated in favour of nested messages. Anyway, I was referring more to the implementation. For example, we could first serialize the message to a ByteArrayOutputStream, then write the result and its size to the output. Obviously this approach is much slower, but I was wondering if there were other similar approaches. That's true, and would work. The other option would be to use fixed width integers for the lengths, so then you could reserve space in the buffer, serialize the message, then go back and fill in the length field. This would be an incompatible change to the serialization format, however. Evan -- Evan Jones http://evanjones.ca/ -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com . For more options, visit this group at http://groups.google.com/group/protobuf?hl=en. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] Re: Java implementation questions
Groups are primarily deprecated because it used to be the case that groups could not be reused in other types. Now groups can be reused, but other messages must use the length-delimited format, rather than the group format. In theory, the fact that groups do not require a length prefix makes them more attractive. In practice, you typically end up computing message sizes during serialization anyway (see below), so there is no benefit to using groups. Messages are preferred stylistically: with a group you define a message type and a field at once. I'll admit there are applications where the group format is useful, specifically to stream a serialization by constructing the message on the fly. But internally, there doesn't seem to be a lot of demand for this. There are a number of performance optimizations (in C++, anyway) that depend on having the total size before the data is serialized. When writing to a string, having the size allows you to preallocate the array, avoiding multiple reallocs. For output to abstract streams, it is often the case that a message can fit into the buffer space available in the output, in which case the faster code path serializing to a flat array can be used. This also applies to embedded messages, so even if the parent message doesn't fit, messages farther down the structure can still use the faster path. These turned out to be pretty significant gains. (Again, C++ only. I don't know if there are similar benefits in Java.) On Thu, Aug 5, 2010 at 6:53 AM, Evan Jones ev...@mit.edu wrote: On Aug 5, 2010, at 9:16 , Ralf wrote: I might be mistaken, but didn't groups use this approach - use a special tag to indicate the end of a message? As only tags are checked, there is no need to escape any data. Good point, I forgot about groups. They definitely do use that approach. Maybe one of the Googlers on this list will have a better idea about why groups are now deprecated in favour of nested messages. Anyway, I was referring more to the implementation. For example, we could first serialize the message to a ByteArrayOutputStream, then write the result and its size to the output. Obviously this approach is much slower, but I was wondering if there were other similar approaches. Yes, you could do something like this. If you have some way to efficiently copy the bytes, it might be a win to use this approach and avoid computing sizes altogether. That's true, and would work. The other option would be to use fixed width integers for the lengths, so then you could reserve space in the buffer, serialize the message, then go back and fill in the length field. This would be an incompatible change to the serialization format, however. In fact, we used to use a similar approach where we would assume that sub messages are small. The code would leave a gap for the (varint-encoded) size and serialize the message, then go back and fill in the length. If the assumption was wrong, the data would have to be shifted. Using cached byte sizes turned out to be a win in most cases though, particularly in complex message structures. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.