Re: Wildcards -- Models 4 and 5

Andrey Breslav Sat, 28 May 2016 11:35:13 -0700

My gut feeling is also for Model 5.

And I even dare ask this: can we maybe retire at least some of the raw
types legacy somehow?
I can't say I've explored that direction in any real depth, but maybe
someone else did?


On Thu, May 26, 2016 at 4:36 PM Bjorn B Vardal <[email protected]> wrote:

> We agree that the potential source incompatibility is an acceptable price
> for the reduced bytecode complexity in Model 5. If the source
> incompatibility turns out to be more severe than expected, does it make
> more sense to bring back separate wildcards (?/ref, any), rather than
> bringing back the bytecode complexity of Model 4?
>
> --
> Bjørn Vårdal
>
>
> ----- Original message -----
> From: Brian Goetz <[email protected]>
> Sent by: "valhalla-spec-experts" <
> [email protected]>
> To: [email protected]
> Cc:
> Subject: Wildcards -- Models 4 and 5
> Date: Fri, May 20, 2016 2:36 PM
>
>
> In the 4/20 mail “Wildcards and raw types: story so far”, we outlined our
> explorations for fitting wildcard types into the first several prototypes.
> The summary was:
>
>    -
>
>    Model 1: no wildcards at all
>    -
>
>    Model 2: A pale implementation of wildcards, with lots of problems
>    that stem from trying to fake wildcards via interfaces
>    -
>
>    Model 3: basically the same as Model 2, except members are accessed
>    via indy (which mitigated some of the problems but not all)
>
>    The conclusion was: compiler-driven translation tricks are not going
>    to cut it (as we suspected all along). We’ve since explored two other
>    models (call them 4 and 5) which explore a range of options for VM support
>    for wildcards. The below is a preliminary analysis of these options.
>
> Reflection, classes, and runtime types
>
> While it may not be immediately obvious that this subject is deeply
> connected to reflection, consider a typical implementation of equals():
>       class Box<T> {
>           T t;
>
>           public boolean equals(Object o) {
>               if (!(o instanceof Box))
>                   return false;
>               Box other = (Box) o;
>               return (t == null && other.t == null)
>                   || t.equals(other.t);
>           }
>       }
>
> Some implementations use raw types (Box) for the instanceof and cast
> target; others use wildcards (Box<?>). While the latter is recommended,
> both are widely used in circulation. In any case, as observed in the last
> mail, were we to interpret Box or Box<?> as only including erased boxes,
> then this code would silently break.
>
> The term “class” is horribly overloaded, used to describe the source class
> (class Foo { ... }), the binary classfile, the runtime type derived from
> the classfile, and the reflective mirror for that runtime type. In the past
> these existed in 1:1 correspondence, but no more — a single source class
> now gives rise to a number of runtime types. Having poor terminology causes
> confusion, so let’s refine these terms:
>
>    - *class* refers to a source-level class declaration
>    - *classfile* refers to the binary classfile
>    - *template* refers to the runtime representation of a classfile
>    - *runtime type* refers to a primitive, value, class, or interface
>    type managed by the VM
>
> So historically, all objects had a class, which equally described the
> source class, the classfile, and the runtime type. Going forward, the class
> and the runtime type of an object are distinct concepts. So an
> ArrayList<int> has a *class* of ArrayList, but a *runtime type* of
> ArrayList<int>. Our code name for runtime type is *crass* (obviously a
> better name is needed, but we’ll paint that bikeshed later.)
>
> This allows us to untangle a question that’s been bugging us: what should
> Object.getClass() return on an ArrayList<int>? If we return ArrayList,
> then we can’t distinguish between an erased and a specialized object (bad);
> if we return ArrayList<int>, then existing code that depends on (x.getClass()
> == List.class) may break (bad).
>
> The answer is, of course, that there are two questions the user can ask an
> object: what is your *class*, and what is your *crass*, and they need to
> be detangled. The existing method getClass() will continue to return the
> class mirror; a new method (getCrass()) will return a runtime type mirror
> of some form for the runtime type. Similarly, a class literal will evaluate
> to a class, and some other form of literal / reflective lookup will be
> needed for crass.
>
> The reflective features built into the language (instanceof, casting,
> class literals, getClass()) are mostly tilted towards classes, not types.
> (Some exceptions: you can use a wildcard type in an instanceof, and you
> can do unchecked static casts to generic types, which are erased.) We need
> to extend these to deal in both classes *and* crasses. For getClass() and
> literals, there’s an obvious path: have two forms. For casting, we are
> mostly there (except for the treatment of raw types for any-generic classes
> — which we need to work out separately.) For instanceof, it seems a forced
> move that instanceof Foo is interpreted as “an instance of any runtime
> type projected from class Foo”, but we also would want to apply it to any
> reifiable type as well.
> Wildcard types
>
> In Model 3, we express a parameterized type with a ParamType constant,
> which names a template class and a set of type parameters, which include
> both valid runtime types as well as the special type parameter token
> erased. One natural way to express a wildcard type is to introduce a new
> special type parameter token, wild, so we’d translate Foo<any> as
> ParamType[Foo,wild].
>
> In order for wildcard types to work seamlessly, the minimum functionality
> we’d need from the VM is to manage subtyping (which is used by the VM for
> instanceof, checkcast, verification, array store checks, and array
> covariance.) The wildcard must be seen to be a “top” type for all
> parameterizations:
> ParamType[Foo,T] <: ParamType[Foo,wild]  // for all valid T
>
> And, wildcard parameterizations must be seen to be subtypes of of their
> wildcard-parameterized supertypes. If we have
>        class Foo<any T> extends Bar<T> implements I<T>       { ... }
>        class Moo<any T> extends Goo { }
>
> then we expect
> ParamType[Foo,wild] <: ParamType[Bar,wild]
> ParamType[Foo,wild] <: ParamType[I,wild]
> ParamType[Moo,wild] <: Goo
>
> Wildcards must also support method invocation and field access to the
> members that are in the intersection of the members of all
> parameterizations (these are the total members (those not restricted to
> particular instantiations) whose member descriptors do not contain any type
> variables.) We can continue to implement member access via invokedynamic
> (as we do in Model 3, or alternately, the VM can support invoke*
> bytecodes on wildcard receivers.)
>
> We can apply these wildcard behaviors to any of the wildcard models (i.e.,
> retrofit them onto Model 2/3.)
> Partial wildcards
>
> With multiple type variables, the rules for wildcards generalize cleanly,
> but the number of wildcard types that are a supertype of any given
> parameterized type grows exponentially in the number of type variables. We
> are considering adopting the simplification of erasing all partial
> wildcards in the source type system to a total wildcard in the runtime type
> system (the costs of this are: some additional boxing on access paths where
> boxing might not be necessary, and unchecked casts when casting a broader
> wildcard to a narrower one.)
> Model 4
>
> A constraint we are under is: existing binaries translate the types Foo
> (raw type), Foo<String> (erased parameterization), and Foo<?> all as LFoo;
> (or its equivalent, CONSTANT_Class[Foo]); since existing code treats this
> as meaning an erased class, the natural path would be to continue to
> interpret LFoo; as an erased class.
>
> Model 4 asks the question: “can we reinterpret legacy LFoo; in
> classfiles, and Foo<?> in source files, as any Foo“ (restoring the
> interpretation of Foo<?> to be more in line with user intuition.)
>
> Not surprisingly, the cost of reinterpreting the binaries is extensive.
> Many bytecodes would have to be reinterpreted, including new,
> {get,put}field, invoke*, to make up the difference between the legacy
> meaning of these constructs and the desired new meaning. Worse, while
> boxing provides us a means to have a common representation of signatures
> involving T (T’s bound), in order to get to a common representation for
> signatures involving T[], we’d need to either (a) make int[] a subtype of
> Object[] or (b) have a “boxing conversion” from int[] to Object[] (which
> would be a proxy box; the data would still live in the original int[].)
> Both are intrusive into the aaload and aastore bytecodes and still are
> not anomaly-free.
>
> So, overall, while this seems possible, the implementation cost is very
> high, all of which is for the sake of migration, which will remain as
> legacy constraints long after the old code has been migrated.
> Model 5
>
> Model 5 asks the simpler question: can we continue to interpret LFoo; as
> erased in legacy classfiles, but upgrade to treating Foo<?> as is
> expected in source code? This entails changing the compilation translation
> of Foo<?> from “erased foo” to ParamType[Foo,wild].
>
> This is far less intrusive into the bytecode behavior — legacy code would
> continue to mean what it did at compile time. It does require some
> migration support for handling the fact that field and method descriptors
> have changed (but this is a problem we’re already working on for managing
> the migration of reference classes to value classes.) There are also some
> possible source incompatibilities in the face of separate compilation (to
> be quantified separately).
>
> Model 5 allows users to keep their Foo<?> and have it mean what they
> think it should mean. So we don’t need to introduce a confusing Foo<any>
> wildcard, but we will need a way of saying “erased Foo”, which might be Foo<?
> extends Object> or might be something more compact like Foo<erased>.
> Comparison
>
> Comparing the three models for wildcards (2, 4, 5):
>
>    - Model 2 defines the source construct Foo<?> to permanently mean 
> Foo<erased
>    ref>, even when Foo is anyfied, and introduces a new wildcard Foo<any>
>    — but maintains source and binary compatibility.
>    - Model 4 let’s us keep Foo<?>, and retroactively redefines bytecode
>    behavior — so an old binary can still interoperate with a reified generic
>    instance, and will think a Foo<int> is really a Foo<Integer>.
>    - Model 5 redefines the *source* meaning of Foo<?> to be what users
>    expect, but because we don’t reinterpret old binaries, allows some source
>    incompatibility during migration.
>
> I think this pretty much explores the solution space. Our choices are:
> break the user model of what Foo<?> means, take a probably prohibitive
> hit to distort the VM to apply new semantics to old bytecode, or accept
> some limited source incompatibility under separate compilation but rescue
> the source form that users want.
>
> In my opinion, the Model 5 direction offers the best balance of costs and
> benefits — while there is some short-term migration pain (in relatively
> limited cases, and can be mitigated with compiler help), in the long run,
> it gets us to the world we want without permanently burdening either the
> language (creating confusion between Foo<?> and Foo<any>) or the VM
> implementation.
>
> In all these cases, we still haven’t defined the semantics of *raw types*.
> Raw types existed for migration between pre-generic and generic code; we
> still have that migration problem, plus the new migration problems of
> generic to any-generic, and of pre-generic to any-generic. So in any case,
> we’re going to need to define suitable semantics for raw types
> corresponding to any-generic classes.
> 
>
>
>
-- 
Andrey Breslav
Project Lead of Kotlin
JetBrains
http://kotlinlang.org/
The Drive to Develop

Re: Wildcards -- Models 4 and 5

Reply via email to