The previous discussion topic (which is still not remotely finished, you're not off the hook yet!) centered on migrating generic APIs. A related topic is that of migrating the /implementations/ of these APIs. This exploration has been informed by the prototype of Collections and Streams.

In the lucky case, no changes are needed to the bodies of methods after anyfying the API -- just adding "any" is all you need. However, not all cases are so lucky.

Here's a list of places where simply recompiling existing ref-generic as any-generic could run into problems.

*Nullity. *References are nullable, values are not. (We'll have a separate discussion for "nullable values", which may be desirable for migration compatibility.)

*Variance. *References are polymorphic; values are not.

*Identity. *References have identity; values do not. For example, this means that values cannot be used as the lock object for a synchronized block.

*Object methods. *Value types will almost certainly have some form of equals, hashCode, toString, and getClass; they will almost certainly not have wait, notify, or notifyAll methods, and probably not clone either.

*Relationship with Object. *All reference types are subtypes of Object; value types are not. Similarly, arrays of reference types are subtypes of Object[]; arrays of value types are not.

**Array creation. **Currently, the idiom for creating a T[] array is to create an Object[] array, and statically cast it to T[]. This won't work with value arrays. *

Instanceof, casting, and type literals. *Instanceof doesn't permit a parameterized type on the RHS. However, this is not specific enough for specialized types; the runtime class of List<int> is different from List<String>. Casting does permit a parameterized type, but currently this is interpreted as a static cast. Similarly, type literals as currently formulated are not specific enough either.

*Wildcards. *The existing wildcard Foo<?> means (and must continue to mean) Foo<? extends Object>.

When dealing with quantities that might either be a reference or a value, such as an expression of type 'T' where T is an avar, the compiler must be conservative and only allow operations that can be proven safe for either references or values. So it would have to reject, for example, assigning a null to a T, since we don't know that null is a member of the domain for all T.


Obviously, we are not going to add value types and any-tvars to the language, and then not adjust the other places where type variables meet other language features. So clearly some of these issues will be addressed by extending the semantics of existing language features.

But, we don't necessarily *have* to change anything in order for things to "work". A method in an any-generic class could be "peeled" into a ref version and a value version:

class Foo<any T> {
    <where ref T>
    public void moo(T t) {
        // existing method body
    }

    <where val T>
    public void moo(T t) {
        // alternate method body, that steers clear of restrictions
    }
}

But, asking users to write their code twice would be rude, so we want to keep this sort of peeling to a bare minimum, and preserve it as as an "escape hatch". If we're to minimize peeling, it stands to reason that either new linguistic forms need to be added, or existing forms be stretched to accomodate the broadened domain of genericity. Let's take these one at a time.

*Nullity. *We can further break this down into assignment to null and comparison to null.

Assignment to null is not going to fly. Our current prototype supports the expression T.default, which evaluates to the default value for whatever type T describes. For reference instantiations, this is null; for primitives, this is zero/false. Assignment to null can be replace with assignment to T.default.

For comparison with null, there are some options. In the prototype, we currently have a peeled generic method <any T>Any.isNull(), which returns false for value invocations. However, even swapping out ==null for Any.isNull() is somewhat intrusive; we can define ==null such that it constant folds to false for value instantiations. Then existing source code is unchanged (and there's no runtime overhead for the null check in value instantiations, since its been folded away.)

When we look more closely at the possibility of nullable value types, this will have to be refined.

*Variance. *We already fold "? extends T" and "? super T" to T when T is known to be a value type. (More specifically, we treat wildcards bounded by avars as a dependent type, (if erased T then ? extends T else T)).

*Identity. *There are a few cases here -- synchronization, reference comparison to an Object, System.identityHashCode. I think it makes sense to reject synchronization, and instead ask for more explicit lock-selection logic (perhaps appealing to a peeled <any T> Object lockFor(T) method). For reference comparison to Object, we can treat this as we propose above for comparison to null -- constant fold to false. For System.identityHashCode, we can peel this into something that uses ordinary hashCode for values.

*Object methods. *We've already discussed the Objectible<any T> interface, which would define the core methods equals, hashCode, toString, and getClass. Other Object methods (wait, notify, notifyAll, and probably clone) would not be available on any-T-valued expressions. (Arguably clone() could be the identity function on values, but this may not be worth it -- cloning is pretty broken.)

*Relationship with Object. *Assignment to Object could be accepted as a possibly-autoboxed operation, but I'm not sure this is a great idea -- it might be better to have an explicit toObject() method (maybe even on Objectible). Assignment of T[] to Object[] needs to be rejected, but in most cases Object[] can be replaced with T.erasure[] (just like replacing null with T.default.)

**Array creation. **The current prototype supports the expression form new T[n], which downgrades to Object[] when T is a reference type (and issues an unchecked warning.) Alternately, we could provide a reflective method <any T> T[] newArray(int), also with an unchecked warning. We can make the unchecked warning go away if the new expression were new T.erasure[n] (or the library version returned T.erasure).
*
Instanceof, casting, and type literals. *It is straightforward enough to extend instanceof to support "instanceof Foo<int>", and similarly for cast and type literals. We can do the same for the wildcard Foo<any>.

Supporting "instanceof Foo<T>" is trickier because T might be erased, and so it might not give you the answer you expect. Currently in a generic class Foo<T> you can ask if something is instanceof raw Foo, or of wildcard Foo<?>. The equivalent question with any-generics is more complicated; you want to express "If I am erased and the other is erased Foo, OR I am not erased and the other is the same instantiation of Foo as me." (This collapses to "do they have the same runtime class", but that's not really what we want to encourage people to write.) Simply extending instanceof to support Foo<T> (even with an unchecked warning) seems insufficient here, because in the erased case, it will say yes when all it can tell is "they're both erased Foo", and it seems like it promises more than it delivers. (And, it should be possible to write a sensible equals() method without unchecked warnings.) But all is not lost! Our friendly dependent type T.erasure saves us here too:

   if (other instanceof Foo<T.erasure>)

(This is a slight stretching of the syntax, since we're not really asking if the other is an instance of Foo<Object> in the erased case, but only slightly.) We can do the same for casting; I am not yet sure it makes sense to do the same for type literals.

*Wildcards. *Code that makes use of Foo<?> will likely want to migrate to using Foo<any> instead.


Looking at how many times T.erasure plays into the answer, you can see why I was arguing for it in the context of the API migration -- because with any of the other API approaches, we would still have the same set of problems / unchecked warnings when we get to the method body.

Take the equals() method. We would like to be able to write an equals() method once, generically for all instantiations, with no peeling and no unchecked warnings. The T.erasure approach gets us there.

If we have a class Box<T> today:

class Box<T> {
    T t;

    boolean equals(Object o) {
       if (o instanceof Box<?>) {
Box<?> other = (Box<?>) o;
           if (t == null)
               return other.t == null;
Object otherT = other.t;
           return t.equals(otherT);
       }
       else
           return false;
    }
}

The parts in red are those where erasure is exposed to the programmer; the programmer would like to ask if the other object is a Box<T>, cast it to a Box<T>, and extract its state as a T, but can't do so safely, so we settle for answering a looser question.

Here's the same class, anyfied.  Red is code that changes from the above.

class Box<any T> {
    T t;

    boolean equals(Object o) {
       if (o instanceof Box<T.erasure>) {
           Box<T.erasure> other = (Box<T.erasure>) o;
if (t == null)
               return other.t == null;
T.erasure otherT = other.t;
           return t.equals(otherT);  // This is .equals(T.erasure) too
       }
       else
           return false;
    }
}

My claim here is: not only is this safe (no unchecked warnings, no heap pollution), and not only is it more generic because the domain of genericity is broadened, but that it is /less polluted //by erasure /(despite the word "erasure" appearing prominently.) By using the T.erasure type, we're able to explicitly say "use the sharpest type you can, modulo erasure" in the instanceof, cast, variable extraction, and equals contexts, and the limitations of our approximations are explicit -- and we get more type checking than we would by manually erasing things to "Object". We're working within the type system, rather than outside it.


Overall, with the language features adjusted as described (loosely) herein, we can migrate existing generic code to any-generic in a fairly localized and mechanized manner, with only a few idioms (e.g., locking) requiring any sort of peeling on the part of the user. The (incomplete) prototype of Collections in the Valhalla repo seems consistent with this theory.


Oh, and there's one more elephant in this room: serialization. Lots of work will be needed for serialization, which uses Object everywhere ... but that's another day.

Reply via email to