Type-dependent operations

Brian Goetz Thu, 24 Dec 2015 16:15:06 -0800

The previous discussion topic (which is still not remotely finished,you're not off the hook yet!) centered on migrating generic APIs. Arelated topic is that of migrating the /implementations/ of these APIs.This exploration has been informed by the prototype of Collections andStreams.

In the lucky case, no changes are needed to the bodies of methods afteranyfying the API -- just adding "any" is all you need. However, not allcases are so lucky.

Here's a list of places where simply recompiling existing ref-generic asany-generic could run into problems.

*Nullity. *References are nullable, values are not. (We'll have aseparate discussion for "nullable values", which may be desirable formigration compatibility.)


*Variance. *References are polymorphic; values are not.

*Identity. *References have identity; values do not. For example, thismeans that values cannot be used as the lock object for a synchronizedblock.

*Object methods. *Value types will almost certainly have some form ofequals, hashCode, toString, and getClass; they will almost certainly nothave wait, notify, or notifyAll methods, and probably not clone either.

*Relationship with Object. *All reference types are subtypes of Object;value types are not. Similarly, arrays of reference types are subtypesof Object[]; arrays of value types are not.

**Array creation. **Currently, the idiom for creating a T[] array is tocreate an Object[] array, and statically cast it to T[]. This won'twork with value arrays. *

Instanceof, casting, and type literals. *Instanceof doesn't permit aparameterized type on the RHS. However, this is not specific enough forspecialized types; the runtime class of List<int> is different fromList<String>. Casting does permit a parameterized type, but currentlythis is interpreted as a static cast. Similarly, type literals ascurrently formulated are not specific enough either.

*Wildcards. *The existing wildcard Foo<?> means (and must continue tomean) Foo<? extends Object>.

When dealing with quantities that might either be a reference or avalue, such as an expression of type 'T' where T is an avar, thecompiler must be conservative and only allow operations that can beproven safe for either references or values. So it would have toreject, for example, assigning a null to a T, since we don't know thatnull is a member of the domain for all T.

Obviously, we are not going to add value types and any-tvars to thelanguage, and then not adjust the other places where type variables meetother language features. So clearly some of these issues will beaddressed by extending the semantics of existing language features.

But, we don't necessarily *have* to change anything in order for thingsto "work". A method in an any-generic class could be "peeled" into aref version and a value version:


class Foo<any T> {
    <where ref T>
    public void moo(T t) {
        // existing method body
    }

    <where val T>
    public void moo(T t) {
        // alternate method body, that steers clear of restrictions
    }
}

But, asking users to write their code twice would be rude, so we want tokeep this sort of peeling to a bare minimum, and preserve it as as an"escape hatch". If we're to minimize peeling, it stands to reason thateither new linguistic forms need to be added, or existing forms bestretched to accomodate the broadened domain of genericity. Let's takethese one at a time.

*Nullity. *We can further break this down into assignment to null andcomparison to null.

Assignment to null is not going to fly. Our current prototype supportsthe expression T.default, which evaluates to the default value forwhatever type T describes. For reference instantiations, this is null;for primitives, this is zero/false. Assignment to null can be replacewith assignment to T.default.

For comparison with null, there are some options. In the prototype, wecurrently have a peeled generic method <any T>Any.isNull(), whichreturns false for value invocations. However, even swapping out ==nullfor Any.isNull() is somewhat intrusive; we can define ==null such thatit constant folds to false for value instantiations. Then existingsource code is unchanged (and there's no runtime overhead for the nullcheck in value instantiations, since its been folded away.)

When we look more closely at the possibility of nullable value types,this will have to be refined.

*Variance. *We already fold "? extends T" and "? super T" to T whenT is known to be a value type. (More specifically, we treat wildcardsbounded by avars as a dependent type, (if erased T then ? extends Telse T)).

*Identity. *There are a few cases here -- synchronization, referencecomparison to an Object, System.identityHashCode. I think it makessense to reject synchronization, and instead ask for more explicitlock-selection logic (perhaps appealing to a peeled <any T> ObjectlockFor(T) method). For reference comparison to Object, we can treatthis as we propose above for comparison to null -- constant fold tofalse. For System.identityHashCode, we can peel this into somethingthat uses ordinary hashCode for values.

*Object methods. *We've already discussed the Objectible<any T>interface, which would define the core methods equals, hashCode,toString, and getClass. Other Object methods (wait, notify, notifyAll,and probably clone) would not be available on any-T-valued expressions.(Arguably clone() could be the identity function on values, but this maynot be worth it -- cloning is pretty broken.)

*Relationship with Object. *Assignment to Object could be accepted as apossibly-autoboxed operation, but I'm not sure this is a great idea --it might be better to have an explicit toObject() method (maybe even onObjectible). Assignment of T[] to Object[] needs to be rejected, but inmost cases Object[] can be replaced with T.erasure[] (just likereplacing null with T.default.)

**Array creation. **The current prototype supports the expression formnew T[n], which downgrades to Object[] when T is a reference type (andissues an unchecked warning.) Alternately, we could provide areflective method <any T> T[] newArray(int), also with an uncheckedwarning. We can make the unchecked warning go away if the newexpression were new T.erasure[n] (or the library version returnedT.erasure).

Instanceof, casting, and type literals. *It is straightforward enough toextend instanceof to support "instanceof Foo<int>", and similarly forcast and type literals. We can do the same for the wildcard Foo<any>.

Supporting "instanceof Foo<T>" is trickier because T might be erased,and so it might not give you the answer you expect. Currently in ageneric class Foo<T> you can ask if something is instanceof raw Foo, orof wildcard Foo<?>. The equivalent question with any-generics is morecomplicated; you want to express "If I am erased and the other is erasedFoo, OR I am not erased and the other is the same instantiation of Fooas me." (This collapses to "do they have the same runtime class", butthat's not really what we want to encourage people to write.) Simplyextending instanceof to support Foo<T> (even with an unchecked warning)seems insufficient here, because in the erased case, it will say yeswhen all it can tell is "they're both erased Foo", and it seems like itpromises more than it delivers. (And, it should be possible to write asensible equals() method without unchecked warnings.) But all is notlost! Our friendly dependent type T.erasure saves us here too:


   if (other instanceof Foo<T.erasure>)

(This is a slight stretching of the syntax, since we're not reallyasking if the other is an instance of Foo<Object> in the erased case,but only slightly.) We can do the same for casting; I am not yet sureit makes sense to do the same for type literals.

*Wildcards. *Code that makes use of Foo<?> will likely want to migrateto using Foo<any> instead.

Looking at how many times T.erasure plays into the answer, you can seewhy I was arguing for it in the context of the API migration -- becausewith any of the other API approaches, we would still have the same setof problems / unchecked warnings when we get to the method body.

Take the equals() method. We would like to be able to write an equals()method once, generically for all instantiations, with no peeling and nounchecked warnings. The T.erasure approach gets us there.


If we have a class Box<T> today:

class Box<T> {
    T t;

    boolean equals(Object o) {
       if (o instanceof Box<?>) {
Box<?> other = (Box<?>) o;
           if (t == null)
               return other.t == null;
Object otherT = other.t;
           return t.equals(otherT);
       }
       else
           return false;
    }
}

The parts in red are those where erasure is exposed to the programmer;the programmer would like to ask if the other object is a Box<T>, castit to a Box<T>, and extract its state as a T, but can't do so safely, sowe settle for answering a looser question.


Here's the same class, anyfied.  Red is code that changes from the above.

class Box<any T> {
    T t;

    boolean equals(Object o) {
       if (o instanceof Box<T.erasure>) {
           Box<T.erasure> other = (Box<T.erasure>) o;
if (t == null)
               return other.t == null;
T.erasure otherT = other.t;
           return t.equals(otherT);  // This is .equals(T.erasure) too
       }
       else
           return false;
    }
}

My claim here is: not only is this safe (no unchecked warnings, no heappollution), and not only is it more generic because the domain ofgenericity is broadened, but that it is /less polluted //by erasure/(despite the word "erasure" appearing prominently.) By using theT.erasure type, we're able to explicitly say "use the sharpest type youcan, modulo erasure" in the instanceof, cast, variable extraction, andequals contexts, and the limitations of our approximations are explicit-- and we get more type checking than we would by manually erasingthings to "Object". We're working within the type system, rather thanoutside it.

Overall, with the language features adjusted as described (loosely)herein, we can migrate existing generic code to any-generic in a fairlylocalized and mechanized manner, with only a few idioms (e.g., locking)requiring any sort of peeling on the part of the user. The (incomplete)prototype of Collections in the Valhalla repo seems consistent with thistheory.

Oh, and there's one more elephant in this room: serialization. Lots ofwork will be needed for serialization, which uses Object everywhere ...but that's another day.

Type-dependent operations

Reply via email to