Re: [capnproto] v2 modules and versioning

2023-10-31 Thread 'Kenton Varda' via Cap'n Proto
[Sorry for the long delay in replying -- I recently moved into a new house
and have been rather swamped.]

Cap'n Proto follows in the Protobuf philosophy of versioning, which is,
there are no versions, or alternatively, versions are a continuous
spectrum. As long as each incremental change is made in a
backwards-compatible way, then old programs should be able to talk to new
programs and vice versa. If you want to make a breaking change, you make a
whole new type to represent the new protocol. If you want to put "v2" or
whatever in the name of that type, that's up to you.

Cap'n Proto uses 64-bit type IDs to canonically identify a type. Two type
definitions are presumed to be versions of the same type if they have the
same ID. Hence, type *names* are merely a convenience for developers
writing code, but can freely be changed without breaking compatibility. In
pure Cap'n Proto, type IDs are the only global namespace of types, and
collisions there are unlikely due to being chosen randomly. In most
programming languages we are forced to place type names into some sort of
global namespace, but that's up to the language-specific code generator to
deal with.

> backwards compatibility of protocol versions can be mechanically
validated.

With Cap'n Proto's SchemaLoader (in C++, at least), when you load two types
with the same ID, it will automatically check them for compatibility and
choose the newer version as the type that the SchemaLoader ultimately gives
to the application. This is done by actually comparing the schema contents.
If the two versions are inherently incompatible, an exception is thrown. I
don't know if version numbers would actually add anything here.

Of course, it's entirely possible that a newer version of some software has
changed the interpretation of a schema, without changing the actual
definition in any way that is detectably incompatible. Obviously, it's
fundamentally impossible to detect such incompatibility in an automated way.

-Kenton

On Wed, Sep 20, 2023 at 7:50 AM Jonathan Shapiro 
wrote:

> I've been thinking about modules and versioning. CapnProto has an import
> mechanism, but it doesn't seem to have a first-class concept of a schema
> that can be versioned.
>
> Recently, I've spent a bunch of time working in both TypeScript and Go,
> and I had designed a module system for BitC many years ago with some care
> for verification. It took some getting used to, but from a developer
> perspective I have come to feel that Go has made a good set of
> pragmatic decisions by tying code repositories, the cryptographic hashes
> they supply, and version tags to modules and versioning, and by
> *separating* imported identifier aliasing from the import itself. The
> go.mod/go.sum combination seems to handle everything that the node package
> system does with regard to version binding, but subjectively feels simpler.
> I *do* find that import paths get long, and I sometimes wish that go.mod
> had a way to do import path shorthands, but I haven't ever hit a point
> where that seemed critical.
>
> [ I definitely do *not* like Go's decision to conflate identifier
> capitalization with export. It's cute in a bad way, irritating, and breaks
> link compatibility with *everything*. In schemas, the need to distinguish
> between public and private things doesn't arise in the same way, because
> the whole point is to be publishing the protocol. ]
>
>
> For CapnProto, I imagine we would call the versioned thing a schema rather
> than a module. A pleasant side benefit is that well-defined versions on
> schemas offer the possibility that the backwards compatibility of protocol
> versions (e.g.  v1.1.0 relative to v1.0.0) can be mechanically *validated* -
> which seems useful.
>
>
> Two relevant points are made in the CapnProto language description.
> Paraphrasing:
>
>1. "Symbolic names can collide... which can be hard to detect in large
>systems using different versions of protocols."
>
>This point is made in the context of discussing type IDs. CapnProto
>needs type IDs for wire encoding reasons, but *this* isn't the right
>argument for having them! It's an argument for a proper module and version
>system. And as an aside, the question of type equivalence in the presence
>of federated protocols is good for a couple of doctoral dissertations.
>2. "Fully qualified names become large and waste space on the wire."
>
>As has been noted elsewhere, CapnProto's "everything is a namespace"
>leads to *horrifically* long names produced by the generators, so I
>think that ship has already sailed. The Go module system and import design
>limits the length of names in code to "importBinding.typeName". It would
>also help to get rid of the "everything is a namespace" idea.
>
>The notion of wasted space on the wire because of long names seems
>like a red herring, because I can't see anything in the spec suggesting
>that identifier names ever *appear* on the 

[capnproto] v2 modules and versioning

2023-09-20 Thread Jonathan Shapiro
I've been thinking about modules and versioning. CapnProto has an import
mechanism, but it doesn't seem to have a first-class concept of a schema
that can be versioned.

Recently, I've spent a bunch of time working in both TypeScript and Go, and
I had designed a module system for BitC many years ago with some care for
verification. It took some getting used to, but from a developer
perspective I have come to feel that Go has made a good set of
pragmatic decisions by tying code repositories, the cryptographic hashes
they supply, and version tags to modules and versioning, and by
*separating* imported
identifier aliasing from the import itself. The go.mod/go.sum combination
seems to handle everything that the node package system does with regard to
version binding, but subjectively feels simpler. I *do* find that import
paths get long, and I sometimes wish that go.mod had a way to do import
path shorthands, but I haven't ever hit a point where that seemed critical.

[ I definitely do *not* like Go's decision to conflate identifier
capitalization with export. It's cute in a bad way, irritating, and breaks
link compatibility with *everything*. In schemas, the need to distinguish
between public and private things doesn't arise in the same way, because
the whole point is to be publishing the protocol. ]


For CapnProto, I imagine we would call the versioned thing a schema rather
than a module. A pleasant side benefit is that well-defined versions on
schemas offer the possibility that the backwards compatibility of protocol
versions (e.g.  v1.1.0 relative to v1.0.0) can be mechanically *validated* -
which seems useful.


Two relevant points are made in the CapnProto language description.
Paraphrasing:

   1. "Symbolic names can collide... which can be hard to detect in large
   systems using different versions of protocols."

   This point is made in the context of discussing type IDs. CapnProto
   needs type IDs for wire encoding reasons, but *this* isn't the right
   argument for having them! It's an argument for a proper module and version
   system. And as an aside, the question of type equivalence in the presence
   of federated protocols is good for a couple of doctoral dissertations.
   2. "Fully qualified names become large and waste space on the wire."

   As has been noted elsewhere, CapnProto's "everything is a namespace"
   leads to *horrifically* long names produced by the generators, so I
   think that ship has already sailed. The Go module system and import design
   limits the length of names in code to "importBinding.typeName". It would
   also help to get rid of the "everything is a namespace" idea.

   The notion of wasted space on the wire because of long names seems like
   a red herring, because I can't see anything in the spec suggesting that
   identifier names ever *appear* on the wire. If they did, and if
   compression is more important than clarity, we should be thinking about a
   compression-friendly renaming similar to what Google does when minifying
   JavaScript.


Before I rathole too far on this, does anybody else see this as a thing
worth thinking about for v2?


Jonathan

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to capnproto+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/capnproto/CAJdcQk2%3DsUUbvGFRiQ1eWHGtDdgV%3Dq5A8%2BEdYqBV6AJmFnfYkA%40mail.gmail.com.