Array covariance

Brian Goetz Thu, 25 Oct 2018 12:05:09 -0700

Since the Burlington meeting, Simms has been prodding me to offer analternative to supporting array covariance for value arrays. John and Ikicked around the following idea yesterday. At this point, I’d call it ahalf-baked sketch, but so far it seems promising.


       Why covariance at all?

First, let’s talk about why arrays in Java are covariant in the firstplace. And the reason is, unless you have either generics or covariantarrays, you can’t write methods that operate on all (or almost all)array types, like Arrays.sort(). With covariance, we can write


|void sort(Object[], Comparator) { … } |

and we can sort any (reference) array. I wasn’t in the room, but Isuspect that array covariance was something that was grudgingly acceptedbecause it was the only way to write code that worked on arbitrary arrays.

Now, imagine if we had (invariant) generics in Java 1.0. Then we wouldhave written


|<T> void sort(T[], Comparator<T>) { … } |

This makes more sense from a user-model perspective, but what do weerase |T[]| to? Unless we also had covariant arrays, we’d not be able toerase |T[]| to |Object[]|, but there may be a smaller hammer we can usethan covariant arrays for this.

Covariance isn’t bad in itself, but the limitations of Java 1.0 arraysbelie the problem — layout polymorphism. Covariance among arrays ofObject subtypes is fine because they all have the same layout, butprimitive arrays were left out from the beginning. We could extendcovariance to value/primitive arrays — we’ve prove its possible — butthere’s a cost, which is largely: we take what is a clean andpredictable performance model for array access, and pollute it with thepossibility of megamorphic access sites.



       What else is missing about arrays?

Just as primitives are outside of the generic type system (we can’t say|ArrayList<int>|, yet), so are arrays. There’s no way to express “anyarray” in the type system; methods like |System.arraycopy| are forced touse |Object| as a proxy, and then dynamically ensure that what’s passedis actually an array:

|void arraycopy(Object sourceArray, int sourceStart, Object destArray,int destStart, int length) |

Similarly, there are lots of operations on arrays that are painful,ad-hoc, and require VM intrinsification to perform well, like|Arrays.copyOf()|.

(Plus, there’s the usual litany of array problems — no final elements,no volatile elements, fixed layout, etc.)



       Arrays are primitive

The reason for having arrays in the language and VM is the same reasonwe have primitives — they are a compromise of OO principles to gain apractical performance model. There’s few good things about arrays, butone of them is they have simple, transparent cost models in time andspace. Polluting |Object[]| with layout-polymorphism compromises thecost model.


The analogy we should be thinking of is:

   array is-to collection as primitive is-to object

That is, arrays are the ground types that our higher-typed programsbottom out in (or specialize to), not necessarily the types we want inpeople’s faces.



       What would arrays look like in Valhalla?

Obviously, we’d like for value arrays to be supported at least as wellas other arrays. Right now, without array covariance, we’re back in thesame situation the Java 1.0 designers were — that if you want to write|Arrays.sort()|, you need covariance. Currently we have no way to extendthe various nine-way method sets, even if we were willing to write atenth method.

But, in Valhalla, we don’t want 9- or 10-way method sets — we want asingle method. We want for |Arrays.fill()|, for example, to be somethinglike:


|<any T> void fill(T[] array, T element) |

and not have to have eight siblings to clean up the primitive cases.

But, even this syntax is paying homage to the irregularity ofarray-as-primitive. I think what we’d really want is something like

|interface NativeArray<any T> { int length(); T get(int index); voidset(int index, T val); Class<T> componentType(); } |

and then we’d retrofit |Foo[]| to implement |NativeArray<Foo>|. (We’dlikely make NA a restricted interface, that is only implemented bynative arrays, for now.) Then, our single fill method could be (in LW100):


|<any T> void fill(NativeArray<T> array, T element) |

(Impatient readers will want to point out that we could generify overthe index parameter as well. One impossible problem at a time.)


And similarly, arraycopy can be:

|void arraycopy(NativeArray src, int srcStart, NativeArray dest, intdestStart, len) |

Bold claim: In such a world, /there is no need for arrays to becovariant/. If you want to operate on more than one kind of array, usegenerics. If you want covariance, say |NativeArray<? extends T>|.



       Getting there in steps

Suppose we start in steps; let’s start with an erased interface,suitable for later generification:

|interface NativeArray { int length(); Object get(int index); voidset(int index, Object val); Class<?> componentType(); } |

|void fill(NativeArray a, Object element) { for (int i=0; i<a.length;i++) a[i] = element; } |

Note that this works for all arrays now — values, primitives, object,with some boxing. OK, progress.

One obvious wart in the above is that we’ve taken a step back in termsof type safety. We’d like for NA to be generic in its element type, butwe can’t yet use values or primitives as type parameters. Suppose wesaid instead:

|interface NativeArray<T> { int length(); T get(int index); void set(intindex, T val); Class<T> componentType(); } |


|<T> fill(NativeArray<T> a, T element) |



       Overloading

The |NA| story works nicely with existing overload rules too. If wecurrently have:


|fill(int[], int) fill(Object[], Object) |

we can add in an overload

|<T> fill(NativeArray<T>, T) |

Existing source and binary code that wants one of the original methodswill continue to get it, as |Object[]| is more specific than|NativeArray<? extends Object>| and so will be selected in preference tothe NA version. So the new method would only be selected by valueinstantiations, since none of the existing versions are applicable.

(Over time, we might want to remove some of the hand-specializedversions, and turn them into bridges for the specialized genericversion, reducing us from 9 methods to one method with 8 bridges.)



       Translation

Obviously, we’d have a binary compatibility issue, that we’d have tofill in with bridges. And these bridges might conflict with actualoverloads that take |NativeArray|:


|void m(T[] array) { … } void m(NativeArray array) { … } |

but, maybe since there’s no existing code that uses NA, that’s OK.

This triggers all the issues discussed recently regarding migration —there’s a 2x2 matrix of { source, binary } x { client, subclass }compatibility.

|// Stream<T> T[] toArray(IntFunction<T[]> generator) // Arrays <T> T[]copyOf(T[] array) |

So erasing to NA, with bridges, and performing the same generic typechecks we do (perhaps with some extra casts) may be a viable path formanaging the source compatibility aspects.

OK, so if we erase |T[]| to NativeArray, what does that buy us? It/almost/ means we can write our |Arrays.*| methods with an extra overload:


|<T> void fill(NativeArray<T>, T) |

But, we don’t have generics over values yet, hrm. So this doesn’t quiteget us to our Arrays.* methods, dang.


OK, let’s put this idea on hold for a bit.


       Zag before you zig

We could write an erased version:

|void fill(NativeArray, Object) |

which will catch all the flat value arrays, but we lose the compile-timetype checking that the element type is correct.

But, we may be able to pull a move here. What if we were to allowany-vars (in L10) /as long as the resulting signature was invariant inT/? Then we could say (with erased generics!)


|<any T> void fill(NativeArray<T>, T.box element) |

because this would erase to |(NativeArray, Object)V|.In other words, asmall down payment on specialized generics (where it supportscompile-time type checking, but wouldn’t actually affect any bytecodegeneration), might allow us to write the generic-over-arrays code we want.

So, the compiler would type-check that the passed value is an instanceof (or convertible to) the box type for which the passed array is anarray, and then erase the array to NativeArray and erase the element toits box. This would even work for |int[]| arrays.

Then, in L100, we migrate NativeArray to be a true specializableinterface, and we migrate the |fill| method to


|<any T> void fill(NativeArray<T>, T element) |


       Summary

There are lots of holes to fill in, but:

 * We need a suitable translation target for |T[]| in specialized
   generics regardless. One path is full covariance (including for
   primitive arrays); another is to migrate the relatively few methods
   that truck in |T[]| to a different translation.
 * We are going to need some tools for signature migration no matter
   what. I’ve outlined two under separate cover; one is minty-bridges
   (where a method/field says “I am willing to respond to this
   alternate descriptor”), and the other is a technique for migrating
   signatures so that if subclasses override the old signature, we have
   a way to restore order. We are going to need these to get to L100
   regardless.
 * We can borrow some bits from the future to provide a programming
   model for value arrays that is not tied to covariance, and that
   heals the rift between separate array types, which currently are
   completely unrelated. It also moves megamorphic call sites to where
   they belong — |invokeinterface|.

Array covariance

Reply via email to