Some further thoughts on the nature of bang, question, ref, and val.

The model outlined in my mail from yesterday accounted for the distinction between class and type, but left something important out: carriers.  Adding these into the mix, I think this clarifies why `.val` and `!` are different, and why `!` and `?` are not pure inverses.

The user declares _classes_, which includes identity and value classes.  Ignoring generics for the moment, we derive _types_ from classes.  Identity classes give rise to a single principal type (whose name is the written the same as the class, but let's call this `C.ref` for clarity); value classes give rise to two principal types, `C.ref` and `C.val`.

So `val` and `ref` are functions from Class to Type (val is partial):

    val :: ValueClass -> Type
    ref :: Class -> Type

What's missing is Carrier.  Ignoring the legacy primitive carriers (I, J, F, D), we have two carriers, L and Q.  Every type has a carrier.  For the "ref" types, the carrier is L; for the "val" types, the carrier is Q:

    carrier ref T = L
    carrier val T = Q

Now, bang and question.  These are operators on types.  Bang restricts the value set; question (potentially) augments the value set to include null.  Question is best describe as yielding a union type: `T? === T|Null`.  (Note that for all reference types T, T|Null == T, because Null <: T.)

What are the carriers for bang and question types?  We define the carrier on union types by taking the stronger of the two carriers:

    carrier T|U = max (carrier T) (carrier U)

which means that

    carrier question T = L

since we need an L carrier to represent null.  But for "bang", we can preserve the carrier, since we're representing fewer values:

    carrier bang T = carrier T

(Why wouldn't we downgrade the carrier of `Point!` to Q? Because the carrier means more than nullity; it affects atomicity, layout, initialization strategy, etc.)

What this means is that `question` is always information-losing, and that:

    carrier bang question T = L
    carrier question bang T = L

So, the ugly fact here is that "bang" and "question" are not inverses; `T!?` is not always T, nor is `T?!`.

But what I want to know is this: how do we want to denote "T or null", when T is a type variable?  This turns out to be the only place we currently have to utter `.ref`.  And uttering `.ref` here feels like asking the user to do the language's job; what the user wants is to describe the union type "T|Null".  (Since the only sensible representation for this is a reference type, the language will translate it as such anyway, but that's the language's job.)

This is related to how we ask people to describe "nullable int".  There are three choices: `int?`, `int.ref`, and `Integer`.  I would argue that the first is closest to what the user wants: a statement about value sets.  `int.ref` brings in carriers, which is unrelated to what the user really wants here; `Integer` is even worse because the relationship between int and Integer is ad-hoc.  Of course, they will all translate the same way (the L carrier), but that's the compiler's job.

For the only remaining use of `.ref` (returning V.ref from Map::get and friends), I think we want the same; Map::get wants to return "V or null".  Again, ref-ness is a dependent thing, not the essence; the essence is "T|Null".  (Also there's a connection with type patterns, where we may want to expand a null-rejecting type pattern to a null-including one.)

The problem, of course, is that once people see `?`, they will think it is "obvious" that we left out "!" by mistake, because of course they go together.  But they don't, really; they're different things.  But let's set bang aside, and turn to Kevin's next question, which is: if `?` is a union type with the null type, what does that say about `String?`?  This seems to be on a collision course, in that null-analysis efforts would want to treat `String?` as "String, with explicit nullness", but the union interpretation will collapse to just `String`.

Which points the way towards what seems the proper role for bang and question in the surface syntax, if any: to *modify* types with respect to their inclusion of null.  So `String?` and `int!` should probably be errors, since String is already nullable and int is already non-nullable.

Bottom line: as we've discovered half a dozen times already in this project, nearly every time we think that nullity is perfectly correlated to something, we discover it is not. Bang/question are not val/ref; we might be able to get away with using `int.ref` to describe nullable ints, but that doesn't help us at all with nullable or non-nullable type patterns; and none of these are the same as "known vs unknown nullity" (or known vs unknown initialization status.)





On 6/27/2022 2:48 PM, Brian Goetz wrote:
I've been bothered by an uncomfortable feeling that .val and ! are somehow different in nature, but haven't been able to put my finger on it.  Let me make another attempt.

The "bang" and "question" operators operate on types.  In the strictest form, the bang operator takes a type that has null in its value set, and returns a type whose value set is the same, except for null.   But observe that if the value set contains null, then the type has to be a reference type.  And the resulting type also has to be a reference type (except maybe for weird classes like Void) because we're preserving the remaining values, which are references.  So we could say:

    bang :: RefType -> RefType

Bang doesn't change the ref-ness, or id-ness, of a type, it just excludes a specific value from the value set.

Now, what do ref and val do?  They don't operate on types, they operates on _classes_, to produce a type.  Val can only be applied to value classes, and produces a value type.  In the strictest interpretation (for consistency with bang), ref also only operates on value classes.  So:

    val :: ValClass -> ValType
    ref :: ValClass -> RefType

Now, we've been strict with bang and ref to say they only work when they have a nontrivial effect, and could totalize them in the obvious way (ref is a no-op on an id class; bang is a no-op on a value type.)  Which would give us:

    bang :: Type -> Type
    val :: ValClass -> ValType
    ref :: Class -> RefType

with the added invariant that bang preserves id-ness/val-ness/ref-ness of types.

But still, bang and ref operate on different things, and and produce different things; one takes a type and yields a slightly refined type with similar characteristics, the other takes a class and yields a type with highly specific characteristics.  We can conclude a lot from `val` (its a value type, which already says a lot), but we cannot conclude anything other than  non-nullity from `bang`; it might be a ref or a val type, it might come from an identity or value class.

What this says to me is "val is a subtype of bang"; all vals are bangs, but not all bangs are vals.

A harder problem is what to do about `question`.  The strict interpretation says we can only apply `question` to a type that is already non-null.  In our world, that's ValType.

    question :: ValType -> Type

Or we could totalize as we did with bang, and we get an invariant that question preserves id-ness, val-ness, ref-ness.  But, what does `question` really mean?  Null is a reference.  So there are two interpretations: that question always yields a reference type (which means non-references need to be lifted/boxed), or that question yields a union type.

It turns out that the latter is super-useful on the stack but kind of sucks in the heap.  The return value of `Map::get`, which we've been calling `T.ref`, really wants a union type (T or Null); similarly, many difficult questions in pattern matching might be made less difficult with a `T or Null` Type.  But there is no efficient heap-based representation for such a union type; we could use tagged unions (blech) or just fall back to boxing. Which leaves us with the asymmetry that bang is representation-preserving (as well as other things), but question is not.  (Which makes sense in that one is subtractive and the other is additive.)

So, to your question: is this permanently gross?  I think if we adopt the strictest intepretations:

 - bang is only allowed on types that are already nullable
 - question is only allowed on types that are not nullable (or on type variables)
 - val is only allowed on value classes
 - ref is only allowed on value classes (or on type variables)

(And we can possibly boil away the last one, since if we can say `T?`, there is no need for `T.ref` anywhere.)

What this means is that you can say `String!`, but not `Optional!`, because Optional is already null-free.  Which means there is never any question whether you say `X.val` or `X!` or `X.val!` (or `X.ref!` if we exclude ref entirely).  So then, rather than two ways to say the same thing, there are two ways to say two different things, which have different absolute strengths.

This is somewhat unfortunate, but not "permanently gross."

If we drop `ref` in favor of `?` (not necessarily a slam-dunk), we can consider finding another way to spell `.val` which is less intrusive, though there are not too many options that don't look like line noise.





On 6/15/2022 12:41 PM, Kevin Bourrillion wrote:

* I still am saddled with the deep feeling that ultimate victory here looks like "we don't need a val type, because by capturing the nullness bit and tearability info alone we will make /enough/ usage patterns always-optimizable, and we can live with the downsides". To me the upsides of this simplification are enormous, so if we really must reject it, I may need some help understanding why. It's been stated that a non-null value type means something slightly different from a non-null reference type, but I'm not convinced of this; it's just that sometimes you have the technical ability to conjure a "default" instance and sometimes you don't, but nullness of the type means what it means either way.

    * I think if we plan to go this way (.val), and then we one day
    have a nullable types feature, some things will then be
    permanently gross that I would hope we can avoid. For example,
    nullness *also* demands the concept of bidirectional projection
    of type variables, and for very overlapping reasons. This puts
    things in a super weird place.


Reply via email to