My gut feeling is also for Model 5. And I even dare ask this: can we maybe retire at least some of the raw types legacy somehow? I can't say I've explored that direction in any real depth, but maybe someone else did?
On Thu, May 26, 2016 at 4:36 PM Bjorn B Vardal <[email protected]> wrote: > We agree that the potential source incompatibility is an acceptable price > for the reduced bytecode complexity in Model 5. If the source > incompatibility turns out to be more severe than expected, does it make > more sense to bring back separate wildcards (?/ref, any), rather than > bringing back the bytecode complexity of Model 4? > > -- > Bjørn Vårdal > > > ----- Original message ----- > From: Brian Goetz <[email protected]> > Sent by: "valhalla-spec-experts" < > [email protected]> > To: [email protected] > Cc: > Subject: Wildcards -- Models 4 and 5 > Date: Fri, May 20, 2016 2:36 PM > > > In the 4/20 mail “Wildcards and raw types: story so far”, we outlined our > explorations for fitting wildcard types into the first several prototypes. > The summary was: > > - > > Model 1: no wildcards at all > - > > Model 2: A pale implementation of wildcards, with lots of problems > that stem from trying to fake wildcards via interfaces > - > > Model 3: basically the same as Model 2, except members are accessed > via indy (which mitigated some of the problems but not all) > > The conclusion was: compiler-driven translation tricks are not going > to cut it (as we suspected all along). We’ve since explored two other > models (call them 4 and 5) which explore a range of options for VM support > for wildcards. The below is a preliminary analysis of these options. > > Reflection, classes, and runtime types > > While it may not be immediately obvious that this subject is deeply > connected to reflection, consider a typical implementation of equals(): > class Box<T> { > T t; > > public boolean equals(Object o) { > if (!(o instanceof Box)) > return false; > Box other = (Box) o; > return (t == null && other.t == null) > || t.equals(other.t); > } > } > > Some implementations use raw types (Box) for the instanceof and cast > target; others use wildcards (Box<?>). While the latter is recommended, > both are widely used in circulation. In any case, as observed in the last > mail, were we to interpret Box or Box<?> as only including erased boxes, > then this code would silently break. > > The term “class” is horribly overloaded, used to describe the source class > (class Foo { ... }), the binary classfile, the runtime type derived from > the classfile, and the reflective mirror for that runtime type. In the past > these existed in 1:1 correspondence, but no more — a single source class > now gives rise to a number of runtime types. Having poor terminology causes > confusion, so let’s refine these terms: > > - *class* refers to a source-level class declaration > - *classfile* refers to the binary classfile > - *template* refers to the runtime representation of a classfile > - *runtime type* refers to a primitive, value, class, or interface > type managed by the VM > > So historically, all objects had a class, which equally described the > source class, the classfile, and the runtime type. Going forward, the class > and the runtime type of an object are distinct concepts. So an > ArrayList<int> has a *class* of ArrayList, but a *runtime type* of > ArrayList<int>. Our code name for runtime type is *crass* (obviously a > better name is needed, but we’ll paint that bikeshed later.) > > This allows us to untangle a question that’s been bugging us: what should > Object.getClass() return on an ArrayList<int>? If we return ArrayList, > then we can’t distinguish between an erased and a specialized object (bad); > if we return ArrayList<int>, then existing code that depends on (x.getClass() > == List.class) may break (bad). > > The answer is, of course, that there are two questions the user can ask an > object: what is your *class*, and what is your *crass*, and they need to > be detangled. The existing method getClass() will continue to return the > class mirror; a new method (getCrass()) will return a runtime type mirror > of some form for the runtime type. Similarly, a class literal will evaluate > to a class, and some other form of literal / reflective lookup will be > needed for crass. > > The reflective features built into the language (instanceof, casting, > class literals, getClass()) are mostly tilted towards classes, not types. > (Some exceptions: you can use a wildcard type in an instanceof, and you > can do unchecked static casts to generic types, which are erased.) We need > to extend these to deal in both classes *and* crasses. For getClass() and > literals, there’s an obvious path: have two forms. For casting, we are > mostly there (except for the treatment of raw types for any-generic classes > — which we need to work out separately.) For instanceof, it seems a forced > move that instanceof Foo is interpreted as “an instance of any runtime > type projected from class Foo”, but we also would want to apply it to any > reifiable type as well. > Wildcard types > > In Model 3, we express a parameterized type with a ParamType constant, > which names a template class and a set of type parameters, which include > both valid runtime types as well as the special type parameter token > erased. One natural way to express a wildcard type is to introduce a new > special type parameter token, wild, so we’d translate Foo<any> as > ParamType[Foo,wild]. > > In order for wildcard types to work seamlessly, the minimum functionality > we’d need from the VM is to manage subtyping (which is used by the VM for > instanceof, checkcast, verification, array store checks, and array > covariance.) The wildcard must be seen to be a “top” type for all > parameterizations: > ParamType[Foo,T] <: ParamType[Foo,wild] // for all valid T > > And, wildcard parameterizations must be seen to be subtypes of of their > wildcard-parameterized supertypes. If we have > class Foo<any T> extends Bar<T> implements I<T> { ... } > class Moo<any T> extends Goo { } > > then we expect > ParamType[Foo,wild] <: ParamType[Bar,wild] > ParamType[Foo,wild] <: ParamType[I,wild] > ParamType[Moo,wild] <: Goo > > Wildcards must also support method invocation and field access to the > members that are in the intersection of the members of all > parameterizations (these are the total members (those not restricted to > particular instantiations) whose member descriptors do not contain any type > variables.) We can continue to implement member access via invokedynamic > (as we do in Model 3, or alternately, the VM can support invoke* > bytecodes on wildcard receivers.) > > We can apply these wildcard behaviors to any of the wildcard models (i.e., > retrofit them onto Model 2/3.) > Partial wildcards > > With multiple type variables, the rules for wildcards generalize cleanly, > but the number of wildcard types that are a supertype of any given > parameterized type grows exponentially in the number of type variables. We > are considering adopting the simplification of erasing all partial > wildcards in the source type system to a total wildcard in the runtime type > system (the costs of this are: some additional boxing on access paths where > boxing might not be necessary, and unchecked casts when casting a broader > wildcard to a narrower one.) > Model 4 > > A constraint we are under is: existing binaries translate the types Foo > (raw type), Foo<String> (erased parameterization), and Foo<?> all as LFoo; > (or its equivalent, CONSTANT_Class[Foo]); since existing code treats this > as meaning an erased class, the natural path would be to continue to > interpret LFoo; as an erased class. > > Model 4 asks the question: “can we reinterpret legacy LFoo; in > classfiles, and Foo<?> in source files, as any Foo“ (restoring the > interpretation of Foo<?> to be more in line with user intuition.) > > Not surprisingly, the cost of reinterpreting the binaries is extensive. > Many bytecodes would have to be reinterpreted, including new, > {get,put}field, invoke*, to make up the difference between the legacy > meaning of these constructs and the desired new meaning. Worse, while > boxing provides us a means to have a common representation of signatures > involving T (T’s bound), in order to get to a common representation for > signatures involving T[], we’d need to either (a) make int[] a subtype of > Object[] or (b) have a “boxing conversion” from int[] to Object[] (which > would be a proxy box; the data would still live in the original int[].) > Both are intrusive into the aaload and aastore bytecodes and still are > not anomaly-free. > > So, overall, while this seems possible, the implementation cost is very > high, all of which is for the sake of migration, which will remain as > legacy constraints long after the old code has been migrated. > Model 5 > > Model 5 asks the simpler question: can we continue to interpret LFoo; as > erased in legacy classfiles, but upgrade to treating Foo<?> as is > expected in source code? This entails changing the compilation translation > of Foo<?> from “erased foo” to ParamType[Foo,wild]. > > This is far less intrusive into the bytecode behavior — legacy code would > continue to mean what it did at compile time. It does require some > migration support for handling the fact that field and method descriptors > have changed (but this is a problem we’re already working on for managing > the migration of reference classes to value classes.) There are also some > possible source incompatibilities in the face of separate compilation (to > be quantified separately). > > Model 5 allows users to keep their Foo<?> and have it mean what they > think it should mean. So we don’t need to introduce a confusing Foo<any> > wildcard, but we will need a way of saying “erased Foo”, which might be Foo<? > extends Object> or might be something more compact like Foo<erased>. > Comparison > > Comparing the three models for wildcards (2, 4, 5): > > - Model 2 defines the source construct Foo<?> to permanently mean > Foo<erased > ref>, even when Foo is anyfied, and introduces a new wildcard Foo<any> > — but maintains source and binary compatibility. > - Model 4 let’s us keep Foo<?>, and retroactively redefines bytecode > behavior — so an old binary can still interoperate with a reified generic > instance, and will think a Foo<int> is really a Foo<Integer>. > - Model 5 redefines the *source* meaning of Foo<?> to be what users > expect, but because we don’t reinterpret old binaries, allows some source > incompatibility during migration. > > I think this pretty much explores the solution space. Our choices are: > break the user model of what Foo<?> means, take a probably prohibitive > hit to distort the VM to apply new semantics to old bytecode, or accept > some limited source incompatibility under separate compilation but rescue > the source form that users want. > > In my opinion, the Model 5 direction offers the best balance of costs and > benefits — while there is some short-term migration pain (in relatively > limited cases, and can be mitigated with compiler help), in the long run, > it gets us to the world we want without permanently burdening either the > language (creating confusion between Foo<?> and Foo<any>) or the VM > implementation. > > In all these cases, we still haven’t defined the semantics of *raw types*. > Raw types existed for migration between pre-generic and generic code; we > still have that migration problem, plus the new migration problems of > generic to any-generic, and of pre-generic to any-generic. So in any case, > we’re going to need to define suitable semantics for raw types > corresponding to any-generic classes. > > > > -- Andrey Breslav Project Lead of Kotlin JetBrains http://kotlinlang.org/ The Drive to Develop
