Re: Immutability of generated data structures

Kenton Varda Thu, 23 Apr 2009 18:09:26 -0700

On Thu, Apr 23, 2009 at 3:59 PM, Kannan Goundan <kan...@cakoose.com> wrote:


> The reason I'm asking all this is that I'm implementing a data
> serialization format that has the same usage model as protocol buffers
> (i.e. generate language bindings, serialize/deserialize).
>
> For me, the biggest benefit of immutability here is that it prevents
> cycles in the object graph.  (Well, I think this is true for Java and
> C++.  In Haskell, though, this is not the case).  Without cycles, the
> serialization code doesn't have to worry about getting stuck in an
> infinite loop (though it's more likely that it'll eventually stack
> overflow...).
>
> The downside is the potential inefficiency of making unnecessary
> copies of the data structures.


Note that immutable data structures also allow you to *avoid* copies in a
lot of cases, since there's no need to make defensive copies.  So it's not
necessarily always less efficient -- in some cases it is more efficient.
 I'm not sure what the average case is.


>  I'm on board with the benefits of
> immutability, so I'm usually willing to take the performance hit, but
> I wasn't sure if others would be as well.  Have you guys gotten any
> requests to add a "generate mutable data structures" mode to protoc?


Not really.  Note that builder objects can be used like mutable data
structures, with some limitations (e.g. the sub-messages aren't mutable).


> If I do end up having to support mutable structures, maybe there's
> some clever way to efficiently prevent cycles in an object graph as it
> is being manipulated.  I really don't want to have to detect cycles in
> the serializer (I think cycle detection is what makes Java's built-in
> serialization so slow...).


Honestly, I don't think you need to worry about it.  It's really not very
easy to accidentally write code which produces a cyclic data structure.  If
someone manages to do it, they'll figure out what went wrong pretty quickly
when their stack overflows.  Preventing cycles was not a consideration that
factored into our design decision.

That said, if you have mutable data structures, then you probably want to
keep track of ownership anyway.  That is, each object should only be
permitted to have one parent.  Otherwise, it's surprising when editing a
sub-object of one object can mysteriously affect some other object
elsewhere.  This is exactly the kind of problem we were trying to prevent.

This would, of course, mean that your system can only be used to build
trees, not DAGs.  But if you don't want to detect cycles in your
serialization algorithm, I'm guessing you aren't interested in detecting
when the same object appears in multiple places.  So your serialization is
really writing trees anyway.


>
>
> - Kannan
>
> On Thu, Apr 23, 2009 at 15:21, Kenton Varda <ken...@google.com> wrote:
> > You're specifically talking about the Java implementation.  We quite
> > intentionally chose to make message objects completely immutable.
>  Version 1
> > of protocol buffers (never released publicly) used mutable objects, and
> we
> > found it lead to many bugs as people would modify messages that were
> > simultaneously being used elsewhere in the app.  To defend against such
> > bugs, people had to constantly make "defensive copies" of message objects
> --
> > often unnecessarily, because it was hard to be sure when a copy was
> > necessary.
> > In C++, we solve this problem using the "const" qualifier, but Java has
> no
> > "const", so we had to go a different route in Java.
> > The idea to use immutable objects was actually first suggested to me by
> Josh
> > Bloch (author of Effective Java, and Google employee).  Since I
> personally
> > am a fan of functional programming, I liked the idea a lot and ran with
> it.
> >  Most Java developers inside Google seem to think this was a big
> > improvement.
> >
> > On Thu, Apr 23, 2009 at 5:03 AM, Kannan Goundan <kan...@cakoose.com>
> wrote:
> >>
> >> The code generated by protoc seems to go to great lengths to make sure
> >> that once a message object is created, it can't be modified.  I'm
> >> guessing that this is to avoid cycles in the object graph, so that the
> >> serialization routine doesn't have to detect cycles.  Is this
> >> correct?  Would a cycle in the object graph put the current serializer
> >> into an infinite loop?
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Immutability of generated data structures

Reply via email to