Re: [protobuf] Enum values are siblings of their type, not children of it.

2010-08-25 Thread alopecoid
It could be made an option. Something like...

  message MyMessage {
enum Foo {
  option cpp_namespace = true;
  FIRST = 0;
  ...
}
...
  }

...and then could be made the default if there is ever a Proto3.

But, even if this is just ends up on some if-there-ever-is-a-Proto3
to-do list, that would be cool... just so it's not forgotten about.
It's one of those small things that make the API nicer. I've come
across this scenario several times (and so have other colleagues).

On Tue, Aug 24, 2010 at 5:56 PM, Kenton Varda ken...@google.com wrote:
 How would you make this change without updating millions of lines of
 existing C++ code that uses protobuf enums?

 On Fri, Aug 20, 2010 at 11:12 AM, alopecoid alopec...@gmail.com wrote:

 Hi,

 This post is about the fact that protobuf enum values use C++ scoping
 rules, meaning that, unlike in Java, enum values are siblings of their
 type, not children of it.

 Say I have the following contrived message:

  message MyMessage {
    enum Foo {
      FIRST = 0;
      SECOND = 1;
      BOTH = 2;
    }
    required Foo foo = 1;

    enum Bar {
      FIRST = 0;
      SECOND = 1;
      BOTH = 2;
    }
    required Bar bar = 2;
  }

 This wouldn't compile because the protobuf compiler recognizes the
 fact that for C++, the generated enum values for Foo and Bar would
 conflict with each other.

 However, for Java, this wouldn't be a problem. I would like to propose
 that instead of punishing the generated Java code because of C++'s
 strange enum behavior (by forcing developers to rename their enum
 values even though they don't collide), that instead, the generated C+
 + enum declarations are wrapped in their own nested namespaces? For
 example, something like:

  namespace Foo {
    enum Enum {
      FIRST = 0;
      SECOND = 1;
      BOTH = 2;
    }
  }

  namespace Bar {
    enum Enum {
      FIRST = 0;
      SECOND = 1;
      BOTH = 2;
    }
  }

 At this point, the enum values would be accessed like Foo::FIRST,
 Bar::FIRST, etc, which would eliminate the enum value collision
 problem altogether, and at the same time make them appear to behave
 more like Java's enum scoping rules (which arguably make more sense).

 Thoughts?

 Thank you.

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.




-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



[protobuf] Concatenation of adjacent strings in text-formated protobuf message (Java API)

2009-10-30 Thread alopecoid

Hi,

Using the Java API, when attempting to parse a text-formatted protobuf
message that contains adjacent strings that are meant to be
concatenated, such as in the following contrived example:

  name:John Smith
  profession:mailman
  description:
  all these strings 
  are concatenated to form 
  a single very long description

The following exception is thrown:
  Exception in thread main com.google.protobuf.TextFormat
$ParseException: 3:5: Expected identifier.
  at com.google.protobuf.TextFormat$Tokenizer.parseException
(TextFormat.java:698)
  at com.google.protobuf.TextFormat$Tokenizer.consumeIdentifier
(TextFormat.java:525)
  at com.google.protobuf.TextFormat.mergeField(TextFormat.java:
851)
  at com.google.protobuf.TextFormat.merge(TextFormat.java:811)
  at com.google.protobuf.TextFormat.merge(TextFormat.java:757)

Is this not meant to be supported?

Thank you.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Can serialized messages be used reliably as keys?

2009-09-29 Thread alopecoid

Hi,

Can serialized messages be used reliably as keys?

In other words, is it guaranteed that...

- Two equal messages will always generate equal byte sequences?
(Are fields always written in the same order?)

- Two unequal messages will always generate unequal byte sequences?
(Are tag identifiers enough to delimit variable length fields from
accidentally producing equal byte sequences?)

I have a feeling that the answer is no. For example, given a proto
with two fields, both variable length int64 types, it seems that two
unequal messages could, by chance, generate the same byte sequence:

[1 byte tag] [3 byte value] [1 byte tag] [2 byte value] = 7 bytes
[1 byte tag] [2 byte value] [1 byte tag] [3 byte value] = 7 bytes
[1 byte tag] [6 byte value] = 7 bytes
... etc.

If those 7 bytes just happen to be equal, then the serialized messages
can NOT be used reliably as keys.

Thoughts?

Thank you.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Java deserialization - any best practices for performances?

2009-07-23 Thread alopecoid

Hi,

I haven't actually used the Java protobuf API, but it seems to me from
the quick occasional glance that this isn't entirely true. I mean,
specifically in response to the code snippet posted in the original
message, I would possibly:

1. Reuse the Builder object by calling its clear() method. This would
save from the need to create a new Builder object for each iteration
of the outermost loop.

2. Iterate over the repeated field using the get*Count() and get*
(index) methods instead of the get*List() method. I'm not sure if this
would save anything, but depending on how things are implemented in
the generated code, this could save from allocating a new List object.

Also, might bytes type fields perform better than any string type
fields that you may have in your particular data set? I'm not sure,
but it might be worth benchmarking.

On Jul 18, 9:22 pm, Kenton Varda ken...@google.com wrote:
 On Fri, Jul 17, 2009 at 8:13 PM, Alex Black a...@alexblack.ca wrote:

  When I write out messages using C++ I'm careful to clear messages and
  re-use them, is there something equivalent on the java side when
  reading those same messages in?

 No.  Sorry.  This just doesn't fit at all with the Java library's design,
 and even if it did, you cannot reuse Java String objects, which often
 account for most of the memory usage.  However, memory allocation is cheaper
 in Java than in C++, so there's less to gain from it.



  My code looks like:

  CodedInputStream stream = CodedInputStream.newInstance(inputStream);

  while ( !stream.isAtEnd() )
  {
      MyMessage.Builder builder = MyMessage.newBuilder();
      stream.readMessage(builder, null);
      MyMessage myMessage = builder.build();

      for ( MessageValue messageValue : myMessage.getValuesList() )
      {
         ..
      }
  }

  I'm passing 150 messages each with 1000 items, so presumably memory is
  allocated 150 times for each of the messages...

  - Alex
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Java deserialization - any best practices for performances?

2009-07-23 Thread alopecoid

Hi Kenton,

Thanks for your reply.

 You can't continue to use a Builder after calling build().  Even if we made
 it so you could, it would be building an entirely new object, not reusing
 the old one.  We can't make it reuse the old one because that would break
 the immutability guarantee of message objects.

Hmm... that strikes me as strange. I understand that the Message
objects are immutable, but the Builders are as well? I thought that
they would work more along the lines of String and StringBuilder,
where String is obviously immutable and StringBuilder is mutable/
reusable.

 But seriously, object allocation with a modern generational garbage
 collector is extremely cheap, especially for objects that don't stick around
 very long.  So I don't think there's much to gain here.

While I agree that object allocation is relatively cheap in Java, I
have noticed that if you generate a lot of garbage, you have to also
spend some time tweaking the garbage collector settings to avoid long/
frequent garbage collection pauses. I know that there has been a lot
of recent work done in Java 7 (and experimentally in Java 6) to avoid
this, but I haven't had the opportunity to test this yet. In fact, I
find that often times this is the real difference in performance
between Java and C++ in the cases where C++ seems to perform
significantly faster... different object allocation practices (but
more importantly, implementation/design choices). I don't know how
well this holds true for a spectrum of different usage patterns, but
my experience has been more from the large scale data processing side
of things. And don't get me wrong, I'm actually one of the few people
(out of my closest colleagues) who think that data processing can and
should be done in Java over C++, but that's another discussion
entirely :)

But while we're on the subject, I have been looking for some rough
benchmarks comparing the performance of Protocol Buffers in Java
versus C++. Do you (the collective you) have any [rough] idea as to
how they compare performance wise? I am thinking more in terms of
batch-style processing (disk I/O, parsing centric) rather than RPC
centric usage patterns. Any experiences you can share would be great.

Thanks!
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---