Since the Burlington meeting, Simms has been prodding me to offer an alternative to supporting array covariance for value arrays. John and I kicked around the following idea yesterday. At this point, I’d call it a half-baked sketch, but so far it seems promising.

       Why covariance at all?

First, let’s talk about why arrays in Java are covariant in the first place. And the reason is, unless you have either generics or covariant arrays, you can’t write methods that operate on all (or almost all) array types, like Arrays.sort(). With covariance, we can write

|void sort(Object[], Comparator) { … } |

and we can sort any (reference) array. I wasn’t in the room, but I suspect that array covariance was something that was grudgingly accepted because it was the only way to write code that worked on arbitrary arrays.

Now, imagine if we had (invariant) generics in Java 1.0. Then we would have written

|<T> void sort(T[], Comparator<T>) { … } |

This makes more sense from a user-model perspective, but what do we erase |T[]| to? Unless we also had covariant arrays, we’d not be able to erase |T[]| to |Object[]|, but there may be a smaller hammer we can use than covariant arrays for this.

Covariance isn’t bad in itself, but the limitations of Java 1.0 arrays belie the problem — layout polymorphism. Covariance among arrays of Object subtypes is fine because they all have the same layout, but primitive arrays were left out from the beginning. We could extend covariance to value/primitive arrays — we’ve prove its possible — but there’s a cost, which is largely: we take what is a clean and predictable performance model for array access, and pollute it with the possibility of megamorphic access sites.


       What else is missing about arrays?

Just as primitives are outside of the generic type system (we can’t say |ArrayList<int>|, yet), so are arrays. There’s no way to express “any array” in the type system; methods like |System.arraycopy| are forced to use |Object| as a proxy, and then dynamically ensure that what’s passed is actually an array:

|void arraycopy(Object sourceArray, int sourceStart, Object destArray, int destStart, int length) |

It would be much nicer if we could say |AnyArray| instead of |Object| here; it would be saying what we mean, and would allow us to move some type checking from runtime to compilation time. (It also might give us a path to not having to write nine versions of |Arrays.fill()|.)

Similarly, there are lots of operations on arrays that are painful, ad-hoc, and require VM intrinsification to perform well, like |Arrays.copyOf()|.

(Plus, there’s the usual litany of array problems — no final elements, no volatile elements, fixed layout, etc.)


       Arrays are primitive

The reason for having arrays in the language and VM is the same reason we have primitives — they are a compromise of OO principles to gain a practical performance model. There’s few good things about arrays, but one of them is they have simple, transparent cost models in time and space. Polluting |Object[]| with layout-polymorphism compromises the cost model.

The analogy we should be thinking of is:

   array is-to collection as primitive is-to object

That is, arrays are the ground types that our higher-typed programs bottom out in (or specialize to), not necessarily the types we want in people’s faces.


       What would arrays look like in Valhalla?

Obviously, we’d like for value arrays to be supported at least as well as other arrays. Right now, without array covariance, we’re back in the same situation the Java 1.0 designers were — that if you want to write |Arrays.sort()|, you need covariance. Currently we have no way to extend the various nine-way method sets, even if we were willing to write a tenth method.

But, in Valhalla, we don’t want 9- or 10-way method sets — we want a single method. We want for |Arrays.fill()|, for example, to be something like:

|<any T> void fill(T[] array, T element) |

and not have to have eight siblings to clean up the primitive cases.

But, even this syntax is paying homage to the irregularity of array-as-primitive. I think what we’d really want is something like

|interface NativeArray<any T> { int length(); T get(int index); void set(int index, T val); Class<T> componentType(); } |

and then we’d retrofit |Foo[]| to implement |NativeArray<Foo>|. (We’d likely make NA a restricted interface, that is only implemented by native arrays, for now.) Then, our single fill method could be (in LW100):

|<any T> void fill(NativeArray<T> array, T element) |

(Impatient readers will want to point out that we could generify over the index parameter as well. One impossible problem at a time.)

And similarly, arraycopy can be:

|void arraycopy(NativeArray src, int srcStart, NativeArray dest, int destStart, len) |

Bold claim: In such a world, /there is no need for arrays to be covariant/. If you want to operate on more than one kind of array, use generics. If you want covariance, say |NativeArray<? extends T>|.


       Getting there in steps

Suppose we start in steps; let’s start with an erased interface, suitable for later generification:

|interface NativeArray { int length(); Object get(int index); void set(int index, Object val); Class<?> componentType(); } |

We make NA a restricted interface (reject any classes that explicitly try to implement it at class load time, as if it were sealed), and inject it as a super type into all existing array types (including primitives and values.) The language overloads the |[]| operators to bind to the |get| and |set| methods and continues to simulate the synthetic |length| field. Now, someone who wants to write just one version of |Arrays.fill| can do so:

|void fill(NativeArray a, Object element) { for (int i=0; i<a.length; i++) a[i] = element; } |

Note that this works for all arrays now — values, primitives, object, with some boxing. OK, progress.

One obvious wart in the above is that we’ve taken a step back in terms of type safety. We’d like for NA to be generic in its element type, but we can’t yet use values or primitives as type parameters. Suppose we said instead:

|interface NativeArray<T> { int length(); T get(int index); void set(int index, T val); Class<T> componentType(); } |

and treated |Foo[]| (for reference Foo) as implementing |NA<Foo>|, and for values and primitives, treat |V[]| as implementing |NA<V.box>|. This makes our erased generics story work for:

|<T> fill(NativeArray<T> a, T element) |

but lays down some potholes for migration to specialized generics, as when we get to Valhalla, we’d like for |V[]| to implement |Array<V>|, not |Array<V.box>|. So, this is a hole to fill, but it seems a promising direction.

The result is that we have monomorphic type profiles at |aaload| sites, and potentially polymorphic profiles at |invokeinterface NativeArray.get| sites — which is more justifiable than polluting aaload profiles. And when we get to full specialization, many of these polymorphic profiles may flatten out (as we can safely speculate on |T[]| at specialized |NativeArray<T>| sites.)


       Overloading

The |NA| story works nicely with existing overload rules too. If we currently have:

|fill(int[], int) fill(Object[], Object) |

we can add in an overload

|<T> fill(NativeArray<T>, T) |

Existing source and binary code that wants one of the original methods will continue to get it, as |Object[]| is more specific than |NativeArray<? extends Object>| and so will be selected in preference to the NA version. So the new method would only be selected by value instantiations, since none of the existing versions are applicable.

(Over time, we might want to remove some of the hand-specialized versions, and turn them into bridges for the specialized generic version, reducing us from 9 methods to one method with 8 bridges.)


       Translation

The real stumbling block is: what do we do with |T[]| in generic APIs. Currently, we erase these to array-of-erasure-of-T, so usually |Object[]|. This is one of the things pushing us towards covariance. But, can we change this translation? What if we erased |T[]| to NativeArray?

Obviously, we’d have a binary compatibility issue, that we’d have to fill in with bridges. And these bridges might conflict with actual overloads that take |NativeArray|:

|void m(T[] array) { … } void m(NativeArray array) { … } |

but, maybe since there’s no existing code that uses NA, that’s OK.

This triggers all the issues discussed recently regarding migration — there’s a 2x2 matrix of { source, binary } x { client, subclass } compatibility.

When used in argument position, passing an |Object[]| where a NA is expected is fine, since |Object[]| is a subtype of NA. The problem would come when |T[]| is used in return position (or as a field.) But, these are fairly rare, and implementations (at least in the JDK) are well behaved:

|// Stream<T> T[] toArray(IntFunction<T[]> generator) // Arrays <T> T[] copyOf(T[] array) |

So erasing to NA, with bridges, and performing the same generic type checks we do (perhaps with some extra casts) may be a viable path for managing the source compatibility aspects.

OK, so if we erase |T[]| to NativeArray, what does that buy us? It /almost/ means we can write our |Arrays.*| methods with an extra overload:

|<T> void fill(NativeArray<T>, T) |

But, we don’t have generics over values yet, hrm. So this doesn’t quite get us to our Arrays.* methods, dang.

OK, let’s put this idea on hold for a bit.


       Zag before you zig

We could write an erased version:

|void fill(NativeArray, Object) |

which will catch all the flat value arrays, but we lose the compile-time type checking that the element type is correct.

But, we may be able to pull a move here. What if we were to allow any-vars (in L10) /as long as the resulting signature was invariant in T/? Then we could say (with erased generics!)

|<any T> void fill(NativeArray<T>, T.box element) |

because this would erase to |(NativeArray, Object)V|.In other words, a small down payment on specialized generics (where it supports compile-time type checking, but wouldn’t actually affect any bytecode generation), might allow us to write the generic-over-arrays code we want.

So, the compiler would type-check that the passed value is an instance of (or convertible to) the box type for which the passed array is an array, and then erase the array to NativeArray and erase the element to its box. This would even work for |int[]| arrays.

Then, in L100, we migrate NativeArray to be a true specializable interface, and we migrate the |fill| method to

|<any T> void fill(NativeArray<T>, T element) |


       Summary

There are lots of holes to fill in, but:

 * We need a suitable translation target for |T[]| in specialized
   generics regardless. One path is full covariance (including for
   primitive arrays); another is to migrate the relatively few methods
   that truck in |T[]| to a different translation.
 * We are going to need some tools for signature migration no matter
   what. I’ve outlined two under separate cover; one is minty-bridges
   (where a method/field says “I am willing to respond to this
   alternate descriptor”), and the other is a technique for migrating
   signatures so that if subclasses override the old signature, we have
   a way to restore order. We are going to need these to get to L100
   regardless.
 * We can borrow some bits from the future to provide a programming
   model for value arrays that is not tied to covariance, and that
   heals the rift between separate array types, which currently are
   completely unrelated. It also moves megamorphic call sites to where
   they belong — |invokeinterface|.

​

Reply via email to