Bang, question, ref, and val (was: User model stacking: current status)

Brian Goetz Tue, 28 Jun 2022 12:28:24 -0700

Some further thoughts on the nature of bang, question, ref, and val.

The model outlined in my mail from yesterday accounted for thedistinction between class and type, but left something important out:carriers. Adding these into the mix, I think this clarifies why `.val`and `!` are different, and why `!` and `?` are not pure inverses.

The user declares _classes_, which includes identity and value classes. Ignoring generics for the moment, we derive _types_ from classes. Identity classes give rise to a single principal type (whose name is thewritten the same as the class, but let's call this `C.ref` for clarity);value classes give rise to two principal types, `C.ref` and `C.val`.


So `val` and `ref` are functions from Class to Type (val is partial):

    val :: ValueClass -> Type
    ref :: Class -> Type

What's missing is Carrier. Ignoring the legacy primitive carriers (I,J, F, D), we have two carriers, L and Q. Every type has a carrier. Forthe "ref" types, the carrier is L; for the "val" types, the carrier is Q:


    carrier ref T = L
    carrier val T = Q

Now, bang and question. These are operators on types. Bang restrictsthe value set; question (potentially) augments the value set to includenull. Question is best describe as yielding a union type: `T? ===T|Null`. (Note that for all reference types T, T|Null == T, becauseNull <: T.)

What are the carriers for bang and question types? We define thecarrier on union types by taking the stronger of the two carriers:


    carrier T|U = max (carrier T) (carrier U)

which means that

    carrier question T = L

since we need an L carrier to represent null. But for "bang", we canpreserve the carrier, since we're representing fewer values:


    carrier bang T = carrier T

(Why wouldn't we downgrade the carrier of `Point!` to Q? Because thecarrier means more than nullity; it affects atomicity, layout,initialization strategy, etc.)


What this means is that `question` is always information-losing, and that:

    carrier bang question T = L
    carrier question bang T = L

So, the ugly fact here is that "bang" and "question" are not inverses;`T!?` is not always T, nor is `T?!`.

But what I want to know is this: how do we want to denote "T or null",when T is a type variable? This turns out to be the only place wecurrently have to utter `.ref`. And uttering `.ref` here feels likeasking the user to do the language's job; what the user wants is todescribe the union type "T|Null". (Since the only sensiblerepresentation for this is a reference type, the language will translateit as such anyway, but that's the language's job.)

This is related to how we ask people to describe "nullable int". Thereare three choices: `int?`, `int.ref`, and `Integer`. I would argue thatthe first is closest to what the user wants: a statement about valuesets. `int.ref` brings in carriers, which is unrelated to what the userreally wants here; `Integer` is even worse because the relationshipbetween int and Integer is ad-hoc. Of course, they will all translatethe same way (the L carrier), but that's the compiler's job.

For the only remaining use of `.ref` (returning V.ref from Map::get andfriends), I think we want the same; Map::get wants to return "V ornull". Again, ref-ness is a dependent thing, not the essence; theessence is "T|Null". (Also there's a connection with type patterns,where we may want to expand a null-rejecting type pattern to anull-including one.)

The problem, of course, is that once people see `?`, they will think itis "obvious" that we left out "!" by mistake, because of course they gotogether. But they don't, really; they're different things. But let'sset bang aside, and turn to Kevin's next question, which is: if `?` is aunion type with the null type, what does that say about `String?`? Thisseems to be on a collision course, in that null-analysis efforts wouldwant to treat `String?` as "String, with explicit nullness", but theunion interpretation will collapse to just `String`.

Which points the way towards what seems the proper role for bang andquestion in the surface syntax, if any: to *modify* types with respectto their inclusion of null. So `String?` and `int!` should probably beerrors, since String is already nullable and int is already non-nullable.

Bottom line: as we've discovered half a dozen times already in thisproject, nearly every time we think that nullity is perfectly correlatedto something, we discover it is not. Bang/question are not val/ref; wemight be able to get away with using `int.ref` to describe nullableints, but that doesn't help us at all with nullable or non-nullable typepatterns; and none of these are the same as "known vs unknown nullity"(or known vs unknown initialization status.)






On 6/27/2022 2:48 PM, Brian Goetz wrote:

I've been bothered by an uncomfortable feeling that .val and ! aresomehow different in nature, but haven't been able to put my finger onit. Let me make another attempt.
The "bang" and "question" operators operate on types. In thestrictest form, the bang operator takes a type that has null in itsvalue set, and returns a type whose value set is the same, except fornull. But observe that if the value set contains null, then the typehas to be a reference type. And the resulting type also has to be areference type (except maybe for weird classes like Void) becausewe're preserving the remaining values, which are references. So wecould say:
    bang :: RefType -> RefType
Bang doesn't change the ref-ness, or id-ness, of a type, it justexcludes a specific value from the value set.
Now, what do ref and val do? They don't operate on types, theyoperates on _classes_, to produce a type. Val can only be applied tovalue classes, and produces a value type. In the strictestinterpretation (for consistency with bang), ref also only operates onvalue classes. So:
    val :: ValClass -> ValType
    ref :: ValClass -> RefType
Now, we've been strict with bang and ref to say they only work whenthey have a nontrivial effect, and could totalize them in the obviousway (ref is a no-op on an id class; bang is a no-op on a value type.) Which would give us:
    bang :: Type -> Type
    val :: ValClass -> ValType
    ref :: Class -> RefType
with the added invariant that bang preserves id-ness/val-ness/ref-nessof types.
But still, bang and ref operate on different things, and and producedifferent things; one takes a type and yields a slightly refined typewith similar characteristics, the other takes a class and yields atype with highly specific characteristics. We can conclude a lot from`val` (its a value type, which already says a lot), but we cannotconclude anything other than non-nullity from `bang`; it might be aref or a val type, it might come from an identity or value class.
What this says to me is "val is a subtype of bang"; all vals arebangs, but not all bangs are vals.
A harder problem is what to do about `question`. The strictinterpretation says we can only apply `question` to a type that isalready non-null. In our world, that's ValType.
    question :: ValType -> Type
Or we could totalize as we did with bang, and we get an invariant thatquestion preserves id-ness, val-ness, ref-ness. But, what does`question` really mean? Null is a reference. So there are twointerpretations: that question always yields a reference type (whichmeans non-references need to be lifted/boxed), or that question yieldsa union type.
It turns out that the latter is super-useful on the stack but kind ofsucks in the heap. The return value of `Map::get`, which we've beencalling `T.ref`, really wants a union type (T or Null); similarly,many difficult questions in pattern matching might be made lessdifficult with a `T or Null` Type. But there is no efficientheap-based representation for such a union type; we could use taggedunions (blech) or just fall back to boxing. Which leaves us with theasymmetry that bang is representation-preserving (as well as otherthings), but question is not. (Which makes sense in that one issubtractive and the other is additive.)
So, to your question: is this permanently gross? I think if we adoptthe strictest intepretations:
 - bang is only allowed on types that are already nullable
- question is only allowed on types that are not nullable (or on typevariables)
 - val is only allowed on value classes
 - ref is only allowed on value classes (or on type variables)
(And we can possibly boil away the last one, since if we can say `T?`,there is no need for `T.ref` anywhere.)
What this means is that you can say `String!`, but not `Optional!`,because Optional is already null-free. Which means there is never anyquestion whether you say `X.val` or `X!` or `X.val!` (or `X.ref!` ifwe exclude ref entirely). So then, rather than two ways to say thesame thing, there are two ways to say two different things, which havedifferent absolute strengths.
This is somewhat unfortunate, but not "permanently gross."
If we drop `ref` in favor of `?` (not necessarily a slam-dunk), we canconsider finding another way to spell `.val` which is less intrusive,though there are not too many options that don't look like line noise.
On 6/15/2022 12:41 PM, Kevin Bourrillion wrote:
* I still am saddled with the deep feeling that ultimate victory herelooks like "we don't need a val type, because by capturing thenullness bit and tearability info alone we will make /enough/ usagepatterns always-optimizable, and we can live with the downsides". Tome the upsides of this simplification are enormous, so if we reallymust reject it, I may need some help understanding why. It's beenstated that a non-null value type means something slightly differentfrom a non-null reference type, but I'm not convinced of this; it'sjust that sometimes you have the technical ability to conjure a"default" instance and sometimes you don't, but nullness of the typemeans what it means either way.
    * I think if we plan to go this way (.val), and then we one day
    have a nullable types feature, some things will then be
    permanently gross that I would hope we can avoid. For example,
    nullness *also* demands the concept of bidirectional projection
    of type variables, and for very overlapping reasons. This puts
    things in a super weird place.

Bang, question, ref, and val (was: User model stacking: current status)

Reply via email to