Some further thoughts on the nature of bang, question, ref, and val.
The model outlined in my mail from yesterday accounted for the
distinction between class and type, but left something important out:
carriers. Adding these into the mix, I think this clarifies why `.val`
and `!` are different, and why `!` and `?` are not pure inverses.
The user declares _classes_, which includes identity and value classes.
Ignoring generics for the moment, we derive _types_ from classes.
Identity classes give rise to a single principal type (whose name is the
written the same as the class, but let's call this `C.ref` for clarity);
value classes give rise to two principal types, `C.ref` and `C.val`.
So `val` and `ref` are functions from Class to Type (val is partial):
val :: ValueClass -> Type
ref :: Class -> Type
What's missing is Carrier. Ignoring the legacy primitive carriers (I,
J, F, D), we have two carriers, L and Q. Every type has a carrier. For
the "ref" types, the carrier is L; for the "val" types, the carrier is Q:
carrier ref T = L
carrier val T = Q
Now, bang and question. These are operators on types. Bang restricts
the value set; question (potentially) augments the value set to include
null. Question is best describe as yielding a union type: `T? ===
T|Null`. (Note that for all reference types T, T|Null == T, because
Null <: T.)
What are the carriers for bang and question types? We define the
carrier on union types by taking the stronger of the two carriers:
carrier T|U = max (carrier T) (carrier U)
which means that
carrier question T = L
since we need an L carrier to represent null. But for "bang", we can
preserve the carrier, since we're representing fewer values:
carrier bang T = carrier T
(Why wouldn't we downgrade the carrier of `Point!` to Q? Because the
carrier means more than nullity; it affects atomicity, layout,
initialization strategy, etc.)
What this means is that `question` is always information-losing, and that:
carrier bang question T = L
carrier question bang T = L
So, the ugly fact here is that "bang" and "question" are not inverses;
`T!?` is not always T, nor is `T?!`.
But what I want to know is this: how do we want to denote "T or null",
when T is a type variable? This turns out to be the only place we
currently have to utter `.ref`. And uttering `.ref` here feels like
asking the user to do the language's job; what the user wants is to
describe the union type "T|Null". (Since the only sensible
representation for this is a reference type, the language will translate
it as such anyway, but that's the language's job.)
This is related to how we ask people to describe "nullable int". There
are three choices: `int?`, `int.ref`, and `Integer`. I would argue that
the first is closest to what the user wants: a statement about value
sets. `int.ref` brings in carriers, which is unrelated to what the user
really wants here; `Integer` is even worse because the relationship
between int and Integer is ad-hoc. Of course, they will all translate
the same way (the L carrier), but that's the compiler's job.
For the only remaining use of `.ref` (returning V.ref from Map::get and
friends), I think we want the same; Map::get wants to return "V or
null". Again, ref-ness is a dependent thing, not the essence; the
essence is "T|Null". (Also there's a connection with type patterns,
where we may want to expand a null-rejecting type pattern to a
null-including one.)
The problem, of course, is that once people see `?`, they will think it
is "obvious" that we left out "!" by mistake, because of course they go
together. But they don't, really; they're different things. But let's
set bang aside, and turn to Kevin's next question, which is: if `?` is a
union type with the null type, what does that say about `String?`? This
seems to be on a collision course, in that null-analysis efforts would
want to treat `String?` as "String, with explicit nullness", but the
union interpretation will collapse to just `String`.
Which points the way towards what seems the proper role for bang and
question in the surface syntax, if any: to *modify* types with respect
to their inclusion of null. So `String?` and `int!` should probably be
errors, since String is already nullable and int is already non-nullable.
Bottom line: as we've discovered half a dozen times already in this
project, nearly every time we think that nullity is perfectly correlated
to something, we discover it is not. Bang/question are not val/ref; we
might be able to get away with using `int.ref` to describe nullable
ints, but that doesn't help us at all with nullable or non-nullable type
patterns; and none of these are the same as "known vs unknown nullity"
(or known vs unknown initialization status.)
On 6/27/2022 2:48 PM, Brian Goetz wrote:
I've been bothered by an uncomfortable feeling that .val and ! are
somehow different in nature, but haven't been able to put my finger on
it. Let me make another attempt.
The "bang" and "question" operators operate on types. In the
strictest form, the bang operator takes a type that has null in its
value set, and returns a type whose value set is the same, except for
null. But observe that if the value set contains null, then the type
has to be a reference type. And the resulting type also has to be a
reference type (except maybe for weird classes like Void) because
we're preserving the remaining values, which are references. So we
could say:
bang :: RefType -> RefType
Bang doesn't change the ref-ness, or id-ness, of a type, it just
excludes a specific value from the value set.
Now, what do ref and val do? They don't operate on types, they
operates on _classes_, to produce a type. Val can only be applied to
value classes, and produces a value type. In the strictest
interpretation (for consistency with bang), ref also only operates on
value classes. So:
val :: ValClass -> ValType
ref :: ValClass -> RefType
Now, we've been strict with bang and ref to say they only work when
they have a nontrivial effect, and could totalize them in the obvious
way (ref is a no-op on an id class; bang is a no-op on a value type.)
Which would give us:
bang :: Type -> Type
val :: ValClass -> ValType
ref :: Class -> RefType
with the added invariant that bang preserves id-ness/val-ness/ref-ness
of types.
But still, bang and ref operate on different things, and and produce
different things; one takes a type and yields a slightly refined type
with similar characteristics, the other takes a class and yields a
type with highly specific characteristics. We can conclude a lot from
`val` (its a value type, which already says a lot), but we cannot
conclude anything other than non-nullity from `bang`; it might be a
ref or a val type, it might come from an identity or value class.
What this says to me is "val is a subtype of bang"; all vals are
bangs, but not all bangs are vals.
A harder problem is what to do about `question`. The strict
interpretation says we can only apply `question` to a type that is
already non-null. In our world, that's ValType.
question :: ValType -> Type
Or we could totalize as we did with bang, and we get an invariant that
question preserves id-ness, val-ness, ref-ness. But, what does
`question` really mean? Null is a reference. So there are two
interpretations: that question always yields a reference type (which
means non-references need to be lifted/boxed), or that question yields
a union type.
It turns out that the latter is super-useful on the stack but kind of
sucks in the heap. The return value of `Map::get`, which we've been
calling `T.ref`, really wants a union type (T or Null); similarly,
many difficult questions in pattern matching might be made less
difficult with a `T or Null` Type. But there is no efficient
heap-based representation for such a union type; we could use tagged
unions (blech) or just fall back to boxing. Which leaves us with the
asymmetry that bang is representation-preserving (as well as other
things), but question is not. (Which makes sense in that one is
subtractive and the other is additive.)
So, to your question: is this permanently gross? I think if we adopt
the strictest intepretations:
- bang is only allowed on types that are already nullable
- question is only allowed on types that are not nullable (or on type
variables)
- val is only allowed on value classes
- ref is only allowed on value classes (or on type variables)
(And we can possibly boil away the last one, since if we can say `T?`,
there is no need for `T.ref` anywhere.)
What this means is that you can say `String!`, but not `Optional!`,
because Optional is already null-free. Which means there is never any
question whether you say `X.val` or `X!` or `X.val!` (or `X.ref!` if
we exclude ref entirely). So then, rather than two ways to say the
same thing, there are two ways to say two different things, which have
different absolute strengths.
This is somewhat unfortunate, but not "permanently gross."
If we drop `ref` in favor of `?` (not necessarily a slam-dunk), we can
consider finding another way to spell `.val` which is less intrusive,
though there are not too many options that don't look like line noise.
On 6/15/2022 12:41 PM, Kevin Bourrillion wrote:
* I still am saddled with the deep feeling that ultimate victory here
looks like "we don't need a val type, because by capturing the
nullness bit and tearability info alone we will make /enough/ usage
patterns always-optimizable, and we can live with the downsides". To
me the upsides of this simplification are enormous, so if we really
must reject it, I may need some help understanding why. It's been
stated that a non-null value type means something slightly different
from a non-null reference type, but I'm not convinced of this; it's
just that sometimes you have the technical ability to conjure a
"default" instance and sometimes you don't, but nullness of the type
means what it means either way.
* I think if we plan to go this way (.val), and then we one day
have a nullable types feature, some things will then be
permanently gross that I would hope we can avoid. For example,
nullness *also* demands the concept of bidirectional projection
of type variables, and for very overlapping reasons. This puts
things in a super weird place.