We discussed two primary means to surface species-specific
members in the language: a "species" placement (name TBD) as
distinct from static and instance, or a "singleton" abstraction
(a la Scala's "object" abstraction, as Peter L suggested). We've
done some experiments comparing the two approaches.
Separately, we discussed two strategies for handling this at the
VM level: having three separate placements (ACC_STATIC,
ACC_SPECIES, and instance) or retconning ACC_STATIC to mean
"species" and using compiler trickery to simulate traditional
statics. In recent discussions with Oracle and IBM VM folks,
they seemed happy enough with having a new placement (and
possibly new bytecodes, {get,put,invoke}species, or overloading
these onto *static with ParamTypes in the owner field of the
various XxxRef constants.)
There are several places where the language itself can take
advantage of species members:
1. Reifying type variables. For an any-generic class Foo<T,U>,
the compiler can generate public static final
reflection-thingie-valued fields called "T" and "U", which means
that "aFoo.T" (as an ordinary field ref!) would evaluate to the
reflective mirror for the reified T -- if present, otherwise it
would evaluate to the reflective mirror for 'erased'.
2. Representation of generic methods. The current translation
strategy has us translating any-generic methods to classes; a
static method
static<any T> void foo(T t) { }
translates to a class (plus an erased bridge):
bridge static foo(Object o) { ... invoke erased
specialization ... }
static class Xxx$foo<any T> {
void foo(T t) { ... }
}
This means that an instance of Xxx$foo is needed to invoke the
method -- but serves solely to carry the type variables -- which
is unfortunate. If instead we translate as:
static class Xxx$foo<any T> {
*species-static *void foo(T t) { ... }
}
then we can invoke this method via invokespecies:
invokespecies ParamType[Xxx$foo, T_inf].foo(T_inf)
where T_inf is the erasure-normalized type inferred for T
(reified if value, `erased` reference.) No fake receiver required.
The translation for generic instance methods is still somewhat
messier (will post separately), but still less messy than if we
also had to manage / cache a receiver.
We also drafted some examples of how such a facility would be
used, writing them both with species-static and with singleton.
Examples and notes below; the summary is that in all cases, the
species-static version is either better or about as good.
1. The old favorite, caching an instantiated instance.
Species
Singleton
class Collections {
private static class Holder<any T> {
private species List<T> empty = new EmptyList<T>();
}
static<any T> List<T> emptyList() { return Holder<T>.empty; }
}
class Collections {
private singleton Holder<any T> {
private empty = new EmptyList<T>();
}
static<any T> List<T> emptyList() { return Holder<T>.empty; }
}
Note that in this case, species by itself isn't enough -- we
still need a holder class, and its a bit ugly. Arguably we could
merge Holder into EmptyList (if that's under our control) but
because Collections is an old-style "static bag" class (aka "sin
bin"), we would still need a holder class for state. (Collections
could share a single holder for multiple things; empty list,
empty set, etc.)
Neither the left nor the right seems particularly better than the
other here. (If we were putting this method on Collection, where
it would likely go in new code since now interfaces can have
statics, the species approach would win, since we'd not need the
holder class any more.)
2. Instantiation tracking.
Species
Singleton
class Foo<any T> {
private species int count;
private species List<Foo<T>> foos;
public Foo() {
++count;
foos.add(this);
}
}
class Foo<any T> {
private singleton FooStuff<T> {
private int count;
private List<Foo<T>> foos;
}
public Foo() {
++Foo<T>.count;
Foo<T>.foos.add(this);
}
}
Because the state is directly tied to the instantiation, the left
seems more attractive -- doesn't require an extra artifact, and
the constructor body seems more straightforward.
3. Implicit-like associations. Here, we're caching type
associations. For example, suppose we have a Box<T>, and we want
to cache the associated class for List<T>.
Species
Singleton
class Box<any T> {
private species Class<List<T>> listClass
= Class.forSpecialization(List, T.crass);
}
class Box<any T> {
private singleton ListBuddy<any T> {
Class<List<T>> clazz
= Class.forSpecialization(List, T.crass);
}
}
The extra singleton declaration feels like "noise" here, because
again the association is with the full set of type args for the
class.
4. Static factories. Arguably, it makes sense to move factories
to the types they describe.
Species
Singleton
interface List<any T> {
private species List<T> empty = new EmptyList<>();
species List<T> emptyList() { return empty; }
}
interface List<any T> {
private singleton Stuff<any T> {
List<T> empty = new EmptyList<>();
}
species List<T> emptyList() { return Stuff<T>.empty; }
}
In this model, you'd get an empty list with
List<T> aList = List<T>.empty()
rather than
List<T> aList = Collections.<T>empty();
In the latter, the type witnesses can be omitted; in the former
they probably can be as well but that's something new.
5. Typevar shredding. Here, we have separate state for
different subsets of variables. This should be the place where
the singleton approach shines.
Species
Singleton
class HashMap<any K, any V> {
private static class Keys<any K> {
species Set<K> allKeys = ...
}
private static class Vals<any V> {
species Set<V> allVals = ...
}
void put(K k, V v) {
Keys<K>.allKeys.add(k);
Vals<V>.allVals.add(v);
}
}
class HashMap<any K, any V> {
private singleton Keys<any K> {
Set<K> allKeys = ...
}
private singleton Vals<any V> {
Set<V> allVals = ...
}
void put(K k, V v) {
Keys<K>.allKeys.add(k);
Vals<V>.allVals.add(v);
}
}
But, it doesn't really shine that much; the left is not really
much worse than the right, just a little more fussy.
In cases where the singleton approach is more natural, the
corresponding "species in static class" idiom isn't so bad
either. But in cases where the species approach is more natural,
there's something unappealing about creating classes (both in
source and runtime footprint) in cases 2/3/4 when we don't need
one. The only place where the singleton approach seems to win big
is when there are multiple variables in the same scope bound by
invariants -- here, the singleton having a ctor is a big win --
but how often does this happen?
So our conclusion is that the species-placement is as good or
better for the identified use cases -- and it also fits cleanly
into the existing model for member placement.