Re: [protobuf] Enum values are siblings of their type, not children of it.
It could be made an option. Something like... message MyMessage { enum Foo { option cpp_namespace = true; FIRST = 0; ... } ... } ...and then could be made the default if there is ever a Proto3. But, even if this is just ends up on some if-there-ever-is-a-Proto3 to-do list, that would be cool... just so it's not forgotten about. It's one of those small things that make the API nicer. I've come across this scenario several times (and so have other colleagues). On Tue, Aug 24, 2010 at 5:56 PM, Kenton Varda ken...@google.com wrote: How would you make this change without updating millions of lines of existing C++ code that uses protobuf enums? On Fri, Aug 20, 2010 at 11:12 AM, alopecoid alopec...@gmail.com wrote: Hi, This post is about the fact that protobuf enum values use C++ scoping rules, meaning that, unlike in Java, enum values are siblings of their type, not children of it. Say I have the following contrived message: message MyMessage { enum Foo { FIRST = 0; SECOND = 1; BOTH = 2; } required Foo foo = 1; enum Bar { FIRST = 0; SECOND = 1; BOTH = 2; } required Bar bar = 2; } This wouldn't compile because the protobuf compiler recognizes the fact that for C++, the generated enum values for Foo and Bar would conflict with each other. However, for Java, this wouldn't be a problem. I would like to propose that instead of punishing the generated Java code because of C++'s strange enum behavior (by forcing developers to rename their enum values even though they don't collide), that instead, the generated C+ + enum declarations are wrapped in their own nested namespaces? For example, something like: namespace Foo { enum Enum { FIRST = 0; SECOND = 1; BOTH = 2; } } namespace Bar { enum Enum { FIRST = 0; SECOND = 1; BOTH = 2; } } At this point, the enum values would be accessed like Foo::FIRST, Bar::FIRST, etc, which would eliminate the enum value collision problem altogether, and at the same time make them appear to behave more like Java's enum scoping rules (which arguably make more sense). Thoughts? Thank you. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] Concatenation of adjacent strings in text-formated protobuf message (Java API)
Hi, Using the Java API, when attempting to parse a text-formatted protobuf message that contains adjacent strings that are meant to be concatenated, such as in the following contrived example: name:John Smith profession:mailman description: all these strings are concatenated to form a single very long description The following exception is thrown: Exception in thread main com.google.protobuf.TextFormat $ParseException: 3:5: Expected identifier. at com.google.protobuf.TextFormat$Tokenizer.parseException (TextFormat.java:698) at com.google.protobuf.TextFormat$Tokenizer.consumeIdentifier (TextFormat.java:525) at com.google.protobuf.TextFormat.mergeField(TextFormat.java: 851) at com.google.protobuf.TextFormat.merge(TextFormat.java:811) at com.google.protobuf.TextFormat.merge(TextFormat.java:757) Is this not meant to be supported? Thank you. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Can serialized messages be used reliably as keys?
Hi, Can serialized messages be used reliably as keys? In other words, is it guaranteed that... - Two equal messages will always generate equal byte sequences? (Are fields always written in the same order?) - Two unequal messages will always generate unequal byte sequences? (Are tag identifiers enough to delimit variable length fields from accidentally producing equal byte sequences?) I have a feeling that the answer is no. For example, given a proto with two fields, both variable length int64 types, it seems that two unequal messages could, by chance, generate the same byte sequence: [1 byte tag] [3 byte value] [1 byte tag] [2 byte value] = 7 bytes [1 byte tag] [2 byte value] [1 byte tag] [3 byte value] = 7 bytes [1 byte tag] [6 byte value] = 7 bytes ... etc. If those 7 bytes just happen to be equal, then the serialized messages can NOT be used reliably as keys. Thoughts? Thank you. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Java deserialization - any best practices for performances?
Hi, I haven't actually used the Java protobuf API, but it seems to me from the quick occasional glance that this isn't entirely true. I mean, specifically in response to the code snippet posted in the original message, I would possibly: 1. Reuse the Builder object by calling its clear() method. This would save from the need to create a new Builder object for each iteration of the outermost loop. 2. Iterate over the repeated field using the get*Count() and get* (index) methods instead of the get*List() method. I'm not sure if this would save anything, but depending on how things are implemented in the generated code, this could save from allocating a new List object. Also, might bytes type fields perform better than any string type fields that you may have in your particular data set? I'm not sure, but it might be worth benchmarking. On Jul 18, 9:22 pm, Kenton Varda ken...@google.com wrote: On Fri, Jul 17, 2009 at 8:13 PM, Alex Black a...@alexblack.ca wrote: When I write out messages using C++ I'm careful to clear messages and re-use them, is there something equivalent on the java side when reading those same messages in? No. Sorry. This just doesn't fit at all with the Java library's design, and even if it did, you cannot reuse Java String objects, which often account for most of the memory usage. However, memory allocation is cheaper in Java than in C++, so there's less to gain from it. My code looks like: CodedInputStream stream = CodedInputStream.newInstance(inputStream); while ( !stream.isAtEnd() ) { MyMessage.Builder builder = MyMessage.newBuilder(); stream.readMessage(builder, null); MyMessage myMessage = builder.build(); for ( MessageValue messageValue : myMessage.getValuesList() ) { .. } } I'm passing 150 messages each with 1000 items, so presumably memory is allocated 150 times for each of the messages... - Alex --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Java deserialization - any best practices for performances?
Hi Kenton, Thanks for your reply. You can't continue to use a Builder after calling build(). Even if we made it so you could, it would be building an entirely new object, not reusing the old one. We can't make it reuse the old one because that would break the immutability guarantee of message objects. Hmm... that strikes me as strange. I understand that the Message objects are immutable, but the Builders are as well? I thought that they would work more along the lines of String and StringBuilder, where String is obviously immutable and StringBuilder is mutable/ reusable. But seriously, object allocation with a modern generational garbage collector is extremely cheap, especially for objects that don't stick around very long. So I don't think there's much to gain here. While I agree that object allocation is relatively cheap in Java, I have noticed that if you generate a lot of garbage, you have to also spend some time tweaking the garbage collector settings to avoid long/ frequent garbage collection pauses. I know that there has been a lot of recent work done in Java 7 (and experimentally in Java 6) to avoid this, but I haven't had the opportunity to test this yet. In fact, I find that often times this is the real difference in performance between Java and C++ in the cases where C++ seems to perform significantly faster... different object allocation practices (but more importantly, implementation/design choices). I don't know how well this holds true for a spectrum of different usage patterns, but my experience has been more from the large scale data processing side of things. And don't get me wrong, I'm actually one of the few people (out of my closest colleagues) who think that data processing can and should be done in Java over C++, but that's another discussion entirely :) But while we're on the subject, I have been looking for some rough benchmarks comparing the performance of Protocol Buffers in Java versus C++. Do you (the collective you) have any [rough] idea as to how they compare performance wise? I am thinking more in terms of batch-style processing (disk I/O, parsing centric) rather than RPC centric usage patterns. Any experiences you can share would be great. Thanks! --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---