Re: The storage hint model

Maurizio Cimadamore Thu, 21 Jul 2022 04:30:09 -0700

Hi Remi,
I've been thinking along similar lines in the past few weeks.

I think that, as with every approach, there are pros and cons to whatyou propose. In a way the difference between type-based and thestorage-based approaches remind me of the distinction betweenhomogeneous and heterogeneous generic reification translation strategies(for more details, please refer to the good read in [1]).

In the strage-based model, the user of a generic class doesn't know if atype-variable will be used 20 levels down the stack; this calls, Ithink, for some sort of type-passing approach, where the generic typeinformation is made available when the class is created, but notnecessarily acted upon by the JVM. That is, an object whose static typeis `Foo<Point>` is just an object whose type is `Foo` which has its"type-token" saved somewhere (in my thesis [2] I did that with anindirection in the oop - an approach that is sometimes referred to asnear/far classes). You need to pass this info around everywhere becauseyou don't know who's gonna use this information (e.g. in your strategy,which class is going to use some T.flat). Granted, in the model youpropose it would be possible to see if a generic class uses T.flat atall, and, if it doesn't, maybe no type token is required - but that's anorthogonal optimization. As there's only one Foo (albeit used w/ or w/oside type information), it is a bit easier to deal with pathologicallypolymorphic cases such as wildcards, or to deal with use cases wheretype information is either missing, or not fit for purpose (think javacinferring a grotesque non-denotable type in a generic method call). Onelast point: in the storage-based model, clients do not have to opt in toget a version of `Foo<Point>` that exhibits some flatness features. It'sup to the owner of `Foo` to decide whether to use `.flat` inside it ornot. This can be seen as a pro, or a cons: on the one hands, there's noneed to rewrite client code to take advantage of specialization (good!)- on the other hand, it is impossible for a client to make sure thatexisting code keeps behaving like it did in the past (bad!).

Conversely, in the type-driven approach, simply "uttering" a specializedtype like `Foo<Point.val>` brings a new runtime type into existence,possibly with a different layout. In this world it's easier to see wherethe type information is flowing into (as Brian pointed out), as that'spart of the type signature. Also, since `Foo<Point.val>` is its ownlittle class (or species), you get a place where to store type-staticmetadata for free. For instance, the type parameter `Point.val` might berepresented as a static field of type `Class<?>` inside the`Foo<Point.val>` species. Overall, a type-driven approach seems to fitbetter with the physics of the VMs we have, given that differentparameterization can be given different runtime types, thus avoidingsome of the profile pollutions that are otherwise hard to address whenusing a storage-based approach (something similar has been discussed forScala miniboxing, see [3]). That said, in this model, dealing withabsence of type information can be tricky, as shown in [4]. As notedabove, clients here need an explicit opt-in into specialization to takeadvantage of it. Creating `Foo<Point>` is one thing, creating`Foo<Point.val>` is another, and clients can decide if they are ok withthe costs associated with specialization.

Finally, as Brian pointed out, under the storage-based translation, inorder for things to work when type information is missing, you have toassume that T.flat doesn't really mean "flat all the time", but only"flat if you can". That is, if there's some side-channel available, thenread T's true form from there, otherwise just take T's erasure and usethat. That said, this problem is not entirely new in this approach.Consider:


```
class Foo<X> {
   X x;
}

class Sub<X> extends Foo<X> { ... }
```

Under the type-driven approach, if I create `Sub<Point.val>`, I'd expectthat species to have a super-species `Foo<Point.val>` (which means `x`will have sharp type `Point.val`). But if I create `Sub<String>`, thenthe super-species is just erased Foo, and the type of `x` is simplyObject. So, the "flat if you can behavior" is there even in thetype-driven approach (e.g. `extends Foo<X>` doesn't mean the same thingin all cases), perhaps more in disguise.

Overall, I don' think either model is "clearly" better than the other -they have different trade-offs which might work better in some contextsand worse in others. What we pick depends primarily, I think, on whetherwe see specialization as a conscious, opt-in decision performed by theuser, or if we see specialization more as something happening "under thehood" (or, put in better terms, under control of library developers).While the latter sounds attractive, some figments of the specializedgeneric type system unfortunately will result in seams (e.g. newNullPointerExceptions) which are _visible_ to clients. So encapsulatingspecialization choices is not something that can be achieved 100%, and Ithink that is where some of us might feel uncomfortable about.


Maurizio

[1] -https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.53.8658&rep=rep1&type=pdf

[2] - http://amsdottorato.unibo.it/2476/

[3] -https://www.semanticscholar.org/paper/Compile-Time-Type-Driven-Data-Representation-in-Ureche/df5831814318ff11d189c4de0485745603fb7afe

[4] - http://cr.openjdk.java.net/~jrose/values/parametric-vm.html







On 20/07/2022 21:05, [email protected] wrote:

----- Original Message -----

From: "Brian Goetz" <[email protected]>
To: "Remi Forax" <[email protected]>
Cc: "valhalla-spec-experts" <[email protected]>
Sent: Wednesday, July 20, 2022 7:34:04 PM
Subject: Re: The storage hint model

Yes, i know, we have already discuss several models like that. But i think, it's
a good idea to re-examine those because i believe they are more attractive
today.

Indeed, this has come up several times.  It is attractive to think of flattening
entirely as a ’storage class’, and fair to reexamine it (this also came up in
an internal discussion recently) but I think in the end this still will be a
choice that we regret.

The main issue with the .val model is that it presents two *types* to the user
while we really want is mostly to flatten the storage and have a precise the
method calling convention.
Those two goals are not equals, the first is far more important than the second,
to the point where the coding guideline proposed by Brian is to use .ref for
the parameters and .val for the fields and arrays.

FTR, the motivation for the the guideline here is “use .val where it makes the
most difference.”  There’s nothing *wrong* with using val types on the stack,
you just don’t get the enormous payback you do with heap variables.  But I can
imagine — especially in a specialized-generics world — that there is value to
using .val in APIs as well, because it carries the semantic “not null”
information as well as the flattening hint.

T.flat carries the same semantics, the difference is that you have to 
explicitly use T.flat where you want the flattening in the generic code.

class Container<T> {
   private T.flat value;  // here

   public void set(T.flat value) {  // but also here
     this.value = value;
   }

   public T.flat get() {  // and here too
     return value;
   }
}

so yes it makes the generic code more cumbersome to write but it also makes 
generic classes easier to use because the writer of the generics decide what 
can be flattened (or not) and not the user of the generics.

We still need .val and .ref to be able to specialize generics, right ? No, i
don't think so, we technically do not have to pass a .val as type argument to
be able to specialize a generic class, we just need to pass a type argument
that can be flatten if it's possible.

Here’s where I disagree.  If field declaration and array creation expressions
were the only places you needed to say .val, I’d be much more sympathetic to
the container-properties model.  But in a world with specialized generics, we
want to flow the types throughout, not only to field layout, but flowing the
non-null constraint to the JIT, etc.  The `T.flat` approach will feel like a
hack, because it is, and as an unbonus, people will forget almost all the time
because having to select a storage class for an abstractly typed variable will
feel unnatural.

People will forget T.flat as much as they will forget C.flat (C.val if you 
prefer), that's true, but that the price to pay to be safe by default, in both 
cases.
If you want to "fix" the potential missing T.flat, it's the same fix as with a 
potential missing C.flat, have a way to declare a value class flat by default at 
declaration site. But that's a separate discussion.

When I say ArrayList<Foo.val>, I want the properties of
Foo.val to flow to *all* the places where a T is being moved around.

Maybe you want or maybe you don't, here is an interesting implementation of 
ArrayList

public classs ArrayList<E> {
   private E[] array;
   private int size;

   public ArrayList() {
     array = new E.flat[16];   // ahah, flat by default !
   }

   public boolean add(E element) {  // E is not flat
     if (element == null && !array.getClass().isNullable()) {
       var newArray = new E[array.length];  // need to store null, use a 
nullable array
       System.arraycopy(array, 0, newArray, 0, array.length);
       array = newArray;
     }
     if (array.length == size) {
       array = Arrays.copyOf(array, size * 2);
     }
     array[size++] = element:
     return true;
   }
}

It starts with a flat array and if an element null is added, it "unflat" itself.
This implementation is interesting because once recompiled with the new generics, a 
new ArrayList<Integer>() will use a flatten array by default.

I've no idea about the performance of such kind of implementations, but using 
T.flat give better control on what is flattenable or not in the implementation.

(This scheme rests on a clever but implicit assumption: that `T.flat` really
means “as flat as T can be”, which for a ref, is “not at all.”  Its clever, but
for this reason `T.flat` is kind of a misnomer.).

If it's a value class, T.flat can still flatten the value if the size is <= 128 
bits but yes, T.flat means as flat as T can be.

we can write instead
  value class C {
    // ...
  }

  class Container<T> {
    private T.flat value;

Yeah, this is where you lose me.  When you’re writing a generic class like
ArrayList<T>, you’re abstracted from the details of heap layout, and it seems
overwhelmingly likely you’d forget to say T.flat somewhere.  It also feels very
“nonparametric”, because we’ve created a second, ad-hoc channel through which
information flows, and that channel is “bumpier".  But its worse than that,
because there’s less type information in the program, and therefore the VM has
to make more conservative assumptions about nullity.

This have been true with the previous proposed storage hint models, but unlike 
those, this model allows parameters to be declared as T.flat.
I think it is the missing piece so the VM as enough information by propagating 
the T.flat so it does not need to make conservative assumptions.

I get what you are trying to accomplish; the ref/val distinction feels like it
is almost something we can get rid of.  But I think swapping it for a storage
class model is worse, because it is asking users to think about low-level
details in more places, rather than using types and having the information flow
with the types.

In more places inside the generic code, in less places inside the user code. 
It's a trade i'm happy to make.

And as you point out, it means there are more possible ways
nulls can get deeper into the system before NPEing.

yes, it can be as late as reaching a putField but it's because as a class 
writer you have more control.
For example with List.of() which never allows null, delaying the NPE may provide 
better error messages, a requireNonNull may be better than having a NPE at the 
callsite like List.<C.val>of(null) will do.

Rémi

Re: The storage hint model

Reply via email to