Re: Message forwarding and partial parsing
Hi Kenton, Thank you for your quick response and your feedback. I'm going to use option 3, since as you say this will probably be the fastest solution and I think that it will fit in the best with our application. You are probably right that this will not be an issue for most messages that are going to be forwarded (most messages will be quite small), but I consider the complexity of the different options to be roughly the same so I might as well go for the solution that feels the best. Thanks, V On Oct 7, 9:44 pm, Kenton Varda ken...@google.com wrote: On Wed, Oct 7, 2009 at 5:46 AM, villintehaspam villintehas...@gmail.comwrote: I am wondering about the best way of forwarding received protocol buffer messages from one entity to another without having to parse the entire message just to serialize it again. It looks like you've figured out all the major options. One thing I'd encourage you to do if you haven't already is actually profile your system to find out if repeated parsing and serialization is a real problem for you. It may not be a real problem in practice even if it feels wrong. Of these three options, I'm thinking that option 3 is the correct way to go. All three options are reasonable. Option 3 is the most complicated solution, but probably the most performant. Am I missing some functionality provided by protocol buffers (such as the ability to skip parsing extensions even if they are recognized or similar or only parse as much as needed)? Am I missing any problems? If you are using C++, then all compiled-in extensions will be eagerly parsed. If you only compile-in the extensions that each process actually cares about, that solves your problem. In Java you provide an ExtensionRegistry listing extensions you care about, so it's trivial to include only the ones you want. I'm guessing you aren't using Java. On a somewhat related note, is it possible to parse a partially transmitted message and continue parsing at a later time when more data is available? Not without blocking. The library is designed to parse an entire message at once. Allowing partial parsing (without blocking) would be quite complicated. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Generating.proto files from java source file
Hi all, a guy that isn't working in my company anymore has defined some protocol buffer messages that we still use. We need to extend these messages now, but we don't have the .proto file. Is there a straight way to generate the proto files from the java classes? How could I do this? Thanks --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
arrays??
Hi, I've looked at protocol buffers, and I've noted that there is no support for arrays of values (double, integers). This is a significant drawback, for example JSOM, HDF5 etc they all have this. One post suggested that one should put an array as one single string in a field I've did this, and the performance was very bad in Java and very memory consuming (compare to the standard Java serialization). I've wrote 500 times the same array (10,000 double numbers ), and after the array 500, my computer was out of memory, Secondly, all tutorials suggest that the file should be written at once, i.e. at the end of the program, when the messages are filled. I want to write data to the disk in several steps. say I want to write one record first (say, one array), then I want to append data to the existing file, and so on, this way I will not need to keep all records in the computer memory. The merge mechanism shown in the tutorial seems parses the old file first, and then add new record, and write a new file. Do I understand this correctly? If yes, then the protocol buffers is not too good for large data volumes, especially with numerical arrays best wishes, Sergei --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: arrays??
On Thu, Oct 8, 2009 at 5:35 PM, sergei175 sergei...@googlemail.com wrote: Hi, I've looked at protocol buffers, and I've noted that there is no support for arrays of values (double, integers). This is a significant drawback, for example JSOM, HDF5 etc they all have this. Have you looked at repeated fields? You can define one like so: repeated double my_number = 1; One post suggested that one should put an array as one single string in a field I've did this, and the performance was very bad in Java and very memory consuming (compare to the standard Java serialization). I've wrote 500 times the same array (10,000 double numbers ), and after the array 500, my computer was out of memory, Secondly, all tutorials suggest that the file should be written at once, i.e. at the end of the program, when the messages are filled. I want to write data to the disk in several steps. say I want to write one record first (say, one array), then I want to append data to the existing file, and so on, this way I will not need to keep all records in the computer memory. The merge mechanism shown in the tutorial seems parses the old file first, and then add new record, and write a new file. Do I understand this correctly? If yes, then the protocol buffers is not too good for large data volumes, especially with numerical arrays best wishes, Sergei --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: arrays??
Hi, This is exactly what I've done before putting arrays into a string. When I've implemented arrays via repeated fields, the program was even slower, and the file size was too large (compare to Java serialization mechanism+ zip). This is why I've moved my array into a string, thinking that there will be no significant overhead storing such object. I guess, each repeated filed has some used additional bits to store them Yes, I used [packed=true] for double field. I did not check what will happen after removing at (probably, the file size will be even bigger!!) cheers, Sergei On Oct 8, 10:48 am, Marc Gravell marc.grav...@gmail.com wrote: For basic types, you can also use packed encoding to reduce the space required; just add [packed=true] to a repeated element. Marc On Oct 8, 4:46 pm, Constantinos Michael constanti...@google.com wrote: On Thu, Oct 8, 2009 at 5:35 PM, sergei175 sergei...@googlemail.com wrote: Hi, I've looked at protocol buffers, and I've noted that there is no support for arrays of values (double, integers). This is a significant drawback, for example JSOM, HDF5 etc they all have this. Have you looked at repeated fields? You can define one like so: repeated double my_number = 1; One post suggested that one should put an array as one single string in a field I've did this, and the performance was very bad in Java and very memory consuming (compare to the standard Java serialization). I've wrote 500 times the same array (10,000 double numbers ), and after the array 500, my computer was out of memory, Secondly, all tutorials suggest that the file should be written at once, i.e. at the end of the program, when the messages are filled. I want to write data to the disk in several steps. say I want to write one record first (say, one array), then I want to append data to the existing file, and so on, this way I will not need to keep all records in the computer memory. The merge mechanism shown in the tutorial seems parses the old file first, and then add new record, and write a new file. Do I understand this correctly? If yes, then the protocol buffers is not too good for large data volumes, especially with numerical arrays best wishes, Sergei --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: arrays??
Hi, This is exactly what I've done before putting arrays into a string. When I've implemented arrays via repeated fields, the program was even slower, and the file size was too large (compare to Java serialization mechanism+ zip). If you put the values in a string and do you own array management on top as compared to using a repeated field with packed option, there should not be a significant difference because it is essentially the same. Protobufs don't come with a compression, so if you compare the sizes, you need to compare compressed Java serialization with compressed proto serialization. If you provide an example of what you want to do and what are your current solutions you compare, people on this list might be able to help. -h --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: arrays??
Ok, this is a simple example of proto buffers file. I want to write 1000 Records. Each record has its name and NamedArray Each array has its name and a set of double numbers, For my example, I've filled array with 10 000 numbers for all 1000 Records. There are 2 things you will see: 1) After event 500, even 200MB memory is not enough. 2) It's slower by factor ~5 compare to the java serialization with the compression. 3) File size is very large. I do not know how to fill compressed recorsd on fly using this package. Finally, there is no even sensible approach to append new Records to the existing file (without merge, which in fact has to parse the existing file first!) So, I do not see any superiority of Protocol Buffers compare to use file formats, it's actually much worst as it come to such situations.. ** // orginize in repeated records message Record { optional string name = 1; message NamedArray { required string name=1 [default = none]; repeated double value=2 [packed=true]; } optional NamedArray array = 2; message PBuffer { repeated Record record = 1; } On Oct 8, 11:33 am, Henner Zeller h.zel...@acm.org wrote: Hi, This is exactly what I've done before putting arrays into a string. When I've implemented arrays via repeated fields, the program was even slower, and the file size was too large (compare to Java serialization mechanism+ zip). If you put the values in a string and do you own array management on top as compared to using a repeated field with packed option, there should not be a significant difference because it is essentially the same. Protobufs don't come with a compression, so if you compare the sizes, you need to compare compressed Java serialization with compressed proto serialization. If you provide an example of what you want to do and what are your current solutions you compare, people on this list might be able to help. -h --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: arrays??
Hi, On Thu, Oct 8, 2009 at 10:57, sergei175 sergei...@googlemail.com wrote: Ok, this is a simple example of proto buffers file. I want to write 1000 Records. Each record has its name and NamedArray Each array has its name and a set of double numbers, For my example, I've filled array with 10 000 numbers for all 1000 Records. There are 2 things you will see: 1) After event 500, even 200MB memory is not enough. 2) It's slower by factor ~5 compare to the java serialization with the compression. So for java serialization, you have a class that contains a ArrayListNamedArray with NamedArray objects containing a Vectordouble and then serialize the whole ArrayListNamedArray to disk ? 3) File size is very large. I do not know how to fill compressed recorsd on fly using this package. If you want to write the independent records, you should write them delimited to a file and not put everything in memory. Regarding compression: you write the stuff to a stream eventually, so you can wrap that with a GZipOutputStream - I guess that is what you do with the Java serialization with compression as well. Finally, there is no even sensible approach to append new Records to the existing file (without merge, which in fact has to parse the existing file first!) Protocol buffers don't provide the transport or storage layer. They provide the encoding. You have to provide for the storage yourself. A simple default implementation might be useful to start but still many people still would need to write their own way of storing things. OTOH, it is only a handful of lines to write it yourself. For things like this (and is has been discussed many times on this list), you should write out delimiters telling the size of the next record followed by the record itself. I think there even has been something added recently to the API to make this simpler (don't know, I use my own implementation ;) ) -h --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: arrays??
On Thu, Oct 8, 2009 at 10:57 AM, sergei175 sergei...@googlemail.com wrote: 1) After event 500, even 200MB memory is not enough. 2) It's slower by factor ~5 compare to the java serialization with the compression. Protocol Buffers do not include compression, so to make this comparison fair you would need to add compression on top of them too. If your speed is dominated by file I/O time (likely!) then you might find that this makes protocol buffers faster. 3) File size is very large. I do not know how to fill compressed recorsd on fly using this package. Use java.util.zip.GZIPOutputStream. Finally, there is no even sensible approach to append new Records to the existing file (without merge, which in fact has to parse the existing file first!) http://code.google.com/apis/protocolbuffers/docs/techniques.html#streaming Protocol Buffers convert between raw bytes and structures. They are not intended to provide a mechanism for managing multiple individually-loadable records. If you have a very large data set, you need to split that set into individual records in order to avoid reading/writing the whole thing at once. Each individual record can be encoded using protobufs, but you should not encode the entire file as a protobuf. So, I do not see any superiority of Protocol Buffers compare to use file formats, it's actually much worst as it come to such situations.. By all means, don't use them then. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Generating.proto files from java source file
Yikes. That's kind of like someone left you with just .class files without the .java files. If you look at the code, though, you will notice that there are comments in it defining each field, like: // optional int32 i = 1; These should be the exact field definitions as they might appear in the .proto file. So if you extract those -- keeping track of which inner class each comment appeared in -- you should be able to reproduce the original .proto file. On Thu, Oct 8, 2009 at 6:34 AM, grasshopper pbde...@gmail.com wrote: Hi all, a guy that isn't working in my company anymore has defined some protocol buffer messages that we still use. We need to extend these messages now, but we don't have the .proto file. Is there a straight way to generate the proto files from the java classes? How could I do this? Thanks --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Generating.proto files from java source file
On Thu, Oct 8, 2009 at 11:32, Kenton Varda ken...@google.com wrote: Yikes. That's kind of like someone left you with just .class files without the .java files. If you look at the code, though, you will notice that there are comments in it defining each field, like: // optional int32 i = 1; These should be the exact field definitions as they might appear in the .proto file. So if you extract those -- keeping track of which inner class each comment appeared in -- you should be able to reproduce the original .proto file. Shouldn't it be possible to extract the descriptor from the java class and then use reflection to emit a proto-file ? --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Generating.proto files from java source file
Hmm, that's true. Although I'm not sure if there's actual code for writing the .proto file in Java. In C++, descriptors have a DebugString() method which returns a compilable .proto file. On Thu, Oct 8, 2009 at 11:41 AM, Henner Zeller h.zel...@acm.org wrote: On Thu, Oct 8, 2009 at 11:32, Kenton Varda ken...@google.com wrote: Yikes. That's kind of like someone left you with just .class files without the .java files. If you look at the code, though, you will notice that there are comments in it defining each field, like: // optional int32 i = 1; These should be the exact field definitions as they might appear in the .proto file. So if you extract those -- keeping track of which inner class each comment appeared in -- you should be able to reproduce the original .proto file. Shouldn't it be possible to extract the descriptor from the java class and then use reflection to emit a proto-file ? --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Compile error: must implement the inherited abstract method
Hi, I've just downloaded and build a fresh Protocol Buffer package. I'm planning to use the Java version. I've added protoc to my path and compiled two .proto files successfully. I've created an eclipse project, and added the protobuf-java-2.2.0.jar file to the path. The two generated a source files go in the src directory. Now when eclipse tries to build the classes, I'm getting an error like the one below for all inner classes in the generated class: The type OneDircontent.DirectoryContent must implement the inherited abstract method Message.toBuilder() Any idea what's wrong? I've set my workspace to build for JDK 1.6. Thanks for any pointers you can provide! Regards, Tom. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: arrays??
Thanks, I've started to understand this better. Indeed, I have to implement my own approach for I/O - protobuf alone is not enough. I only worry that my own I/O to read/write records will not be cross platform, so I could not benefit from the strength of this package. On Oct 8, 1:24 pm, Kenton Varda ken...@google.com wrote: On Thu, Oct 8, 2009 at 10:57 AM, sergei175 sergei...@googlemail.com wrote: 1) After event 500, even 200MB memory is not enough. 2) It's slower by factor ~5 compare to the java serialization with the compression. Protocol Buffers do not include compression, so to make this comparison fair you would need to add compression on top of them too. If your speed is dominated by file I/O time (likely!) then you might find that this makes protocol buffers faster. 3) File size is very large. I do not know how to fill compressed recorsd on fly using this package. Use java.util.zip.GZIPOutputStream. Finally, there is no even sensible approach to append new Records to the existing file (without merge, which in fact has to parse the existing file first!) http://code.google.com/apis/protocolbuffers/docs/techniques.html#stre... Protocol Buffers convert between raw bytes and structures. They are not intended to provide a mechanism for managing multiple individually-loadable records. If you have a very large data set, you need to split that set into individual records in order to avoid reading/writing the whole thing at once. Each individual record can be encoded using protobufs, but you should not encode the entire file as a protobuf. So, I do not see any superiority of Protocol Buffers compare to use file formats, it's actually much worst as it come to such situations.. By all means, don't use them then. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---