Hi Remi,
I've been thinking along similar lines in the past few weeks.

I think that, as with every approach, there are pros and cons to what you propose. In a way the difference between type-based and the storage-based approaches remind me of the distinction between homogeneous and heterogeneous generic reification translation strategies (for more details, please refer to the good read in [1]).

In the strage-based model, the user of a generic class doesn't know if a type-variable will be used 20 levels down the stack; this calls, I think, for some sort of type-passing approach, where the generic type information is made available when the class is created, but not necessarily acted upon by the JVM. That is, an object whose static type is `Foo<Point>` is just an object whose type is `Foo` which has its "type-token" saved somewhere (in my thesis [2] I did that with an indirection in the oop - an approach that is sometimes referred to as near/far classes). You need to pass this info around everywhere because you don't know who's gonna use this information (e.g. in your strategy, which class is going to use some T.flat). Granted, in the model you propose it would be possible to see if a generic class uses T.flat at all, and, if it doesn't, maybe no type token is required - but that's an orthogonal optimization. As there's only one Foo (albeit used w/ or w/o side type information), it is a bit easier to deal with pathologically polymorphic cases such as wildcards, or to deal with use cases where type information is either missing, or not fit for purpose (think javac inferring a grotesque non-denotable type in a generic method call). One last point: in the storage-based model, clients do not have to opt in to get a version of `Foo<Point>` that exhibits some flatness features. It's up to the owner of `Foo` to decide whether to use `.flat` inside it or not. This can be seen as a pro, or a cons: on the one hands, there's no need to rewrite client code to take advantage of specialization (good!) - on the other hand, it is impossible for a client to make sure that existing code keeps behaving like it did in the past (bad!).

Conversely, in the type-driven approach, simply "uttering" a specialized type like `Foo<Point.val>` brings a new runtime type into existence, possibly with a different layout. In this world it's easier to see where the type information is flowing into (as Brian pointed out), as that's part of the type signature. Also, since `Foo<Point.val>` is its own little class (or species), you get a place where to store type-static metadata for free. For instance, the type parameter `Point.val` might be represented as a static field of type `Class<?>` inside the `Foo<Point.val>` species. Overall, a type-driven approach seems to fit better with the physics of the VMs we have, given that different parameterization can be given different runtime types, thus avoiding some of the profile pollutions that are otherwise hard to address when using a storage-based approach (something similar has been discussed for Scala miniboxing, see [3]). That said, in this model, dealing with absence of type information can be tricky, as shown in [4]. As noted above, clients here need an explicit opt-in into specialization to take advantage of it. Creating `Foo<Point>` is one thing, creating `Foo<Point.val>` is another, and clients can decide if they are ok with the costs associated with specialization.

Finally, as Brian pointed out, under the storage-based translation, in order for things to work when type information is missing, you have to assume that T.flat doesn't really mean "flat all the time", but only "flat if you can". That is, if there's some side-channel available, then read T's true form from there, otherwise just take T's erasure and use that. That said, this problem is not entirely new in this approach. Consider:

```
class Foo<X> {
   X x;
}

class Sub<X> extends Foo<X> { ... }
```

Under the type-driven approach, if I create `Sub<Point.val>`, I'd expect that species to have a super-species `Foo<Point.val>` (which means `x` will have sharp type `Point.val`). But if I create `Sub<String>`, then the super-species is just erased Foo, and the type of `x` is simply Object. So, the "flat if you can behavior" is there even in the type-driven approach (e.g. `extends Foo<X>` doesn't mean the same thing in all cases), perhaps more in disguise.

Overall, I don' think either model is "clearly" better than the other - they have different trade-offs which might work better in some contexts and worse in others. What we pick depends primarily, I think, on whether we see specialization as a conscious, opt-in decision performed by the user, or if we see specialization more as something happening "under the hood" (or, put in better terms, under control of library developers). While the latter sounds attractive, some figments of the specialized generic type system unfortunately will result in seams (e.g. new NullPointerExceptions) which are _visible_ to clients. So encapsulating specialization choices is not something that can be achieved 100%, and I think that is where some of us might feel uncomfortable about.

Maurizio


[1] - https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.53.8658&rep=rep1&type=pdf
[2] - http://amsdottorato.unibo.it/2476/
[3] - https://www.semanticscholar.org/paper/Compile-Time-Type-Driven-Data-Representation-in-Ureche/df5831814318ff11d189c4de0485745603fb7afe
[4] - http://cr.openjdk.java.net/~jrose/values/parametric-vm.html







On 20/07/2022 21:05, [email protected] wrote:
----- Original Message -----
From: "Brian Goetz" <[email protected]>
To: "Remi Forax" <[email protected]>
Cc: "valhalla-spec-experts" <[email protected]>
Sent: Wednesday, July 20, 2022 7:34:04 PM
Subject: Re: The storage hint model
Yes, i know, we have already discuss several models like that. But i think, it's
a good idea to re-examine those because i believe they are more attractive
today.
Indeed, this has come up several times.  It is attractive to think of flattening
entirely as a ’storage class’, and fair to reexamine it (this also came up in
an internal discussion recently) but I think in the end this still will be a
choice that we regret.

The main issue with the .val model is that it presents two *types* to the user
while we really want is mostly to flatten the storage and have a precise the
method calling convention.
Those two goals are not equals, the first is far more important than the second,
to the point where the coding guideline proposed by Brian is to use .ref for
the parameters and .val for the fields and arrays.
FTR, the motivation for the the guideline here is “use .val where it makes the
most difference.”  There’s nothing *wrong* with using val types on the stack,
you just don’t get the enormous payback you do with heap variables.  But I can
imagine — especially in a specialized-generics world — that there is value to
using .val in APIs as well, because it carries the semantic “not null”
information as well as the flattening hint.
T.flat carries the same semantics, the difference is that you have to 
explicitly use T.flat where you want the flattening in the generic code.

class Container<T> {
   private T.flat value;  // here

   public void set(T.flat value) {  // but also here
     this.value = value;
   }

   public T.flat get() {  // and here too
     return value;
   }
}

so yes it makes the generic code more cumbersome to write but it also makes 
generic classes easier to use because the writer of the generics decide what 
can be flattened (or not) and not the user of the generics.

We still need .val and .ref to be able to specialize generics, right ? No, i
don't think so, we technically do not have to pass a .val as type argument to
be able to specialize a generic class, we just need to pass a type argument
that can be flatten if it's possible.
Here’s where I disagree.  If field declaration and array creation expressions
were the only places you needed to say .val, I’d be much more sympathetic to
the container-properties model.  But in a world with specialized generics, we
want to flow the types throughout, not only to field layout, but flowing the
non-null constraint to the JIT, etc.  The `T.flat` approach will feel like a
hack, because it is, and as an unbonus, people will forget almost all the time
because having to select a storage class for an abstractly typed variable will
feel unnatural.
People will forget T.flat as much as they will forget C.flat (C.val if you 
prefer), that's true, but that the price to pay to be safe by default, in both 
cases.
If you want to "fix" the potential missing T.flat, it's the same fix as with a 
potential missing C.flat, have a way to declare a value class flat by default at 
declaration site. But that's a separate discussion.

When I say ArrayList<Foo.val>, I want the properties of
Foo.val to flow to *all* the places where a T is being moved around.
Maybe you want or maybe you don't, here is an interesting implementation of 
ArrayList

public classs ArrayList<E> {
   private E[] array;
   private int size;

   public ArrayList() {
     array = new E.flat[16];   // ahah, flat by default !
   }

   public boolean add(E element) {  // E is not flat
     if (element == null && !array.getClass().isNullable()) {
       var newArray = new E[array.length];  // need to store null, use a 
nullable array
       System.arraycopy(array, 0, newArray, 0, array.length);
       array = newArray;
     }
     if (array.length == size) {
       array = Arrays.copyOf(array, size * 2);
     }
     array[size++] = element:
     return true;
   }
}

It starts with a flat array and if an element null is added, it "unflat" itself.
This implementation is interesting because once recompiled with the new generics, a 
new ArrayList<Integer>() will use a flatten array by default.

I've no idea about the performance of such kind of implementations, but using 
T.flat give better control on what is flattenable or not in the implementation.

(This scheme rests on a clever but implicit assumption: that `T.flat` really
means “as flat as T can be”, which for a ref, is “not at all.”  Its clever, but
for this reason `T.flat` is kind of a misnomer.).
If it's a value class, T.flat can still flatten the value if the size is <= 128 
bits but yes, T.flat means as flat as T can be.

we can write instead
  value class C {
    // ...
  }

  class Container<T> {
    private T.flat value;
Yeah, this is where you lose me.  When you’re writing a generic class like
ArrayList<T>, you’re abstracted from the details of heap layout, and it seems
overwhelmingly likely you’d forget to say T.flat somewhere.  It also feels very
“nonparametric”, because we’ve created a second, ad-hoc channel through which
information flows, and that channel is “bumpier".  But its worse than that,
because there’s less type information in the program, and therefore the VM has
to make more conservative assumptions about nullity.
This have been true with the previous proposed storage hint models, but unlike 
those, this model allows parameters to be declared as T.flat.
I think it is the missing piece so the VM as enough information by propagating 
the T.flat so it does not need to make conservative assumptions.

I get what you are trying to accomplish; the ref/val distinction feels like it
is almost something we can get rid of.  But I think swapping it for a storage
class model is worse, because it is asking users to think about low-level
details in more places, rather than using types and having the information flow
with the types.
In more places inside the generic code, in less places inside the user code. 
It's a trade i'm happy to make.

And as you point out, it means there are more possible ways
nulls can get deeper into the system before NPEing.
yes, it can be as late as reaching a putField but it's because as a class 
writer you have more control.
For example with List.of() which never allows null, delaying the NPE may provide 
better error messages, a requireNonNull may be better than having a NPE at the 
callsite like List.<C.val>of(null) will do.

Rémi

Reply via email to