Re: Model 3 classfile design document

Brian Goetz Tue, 02 Feb 2016 05:47:14 -0800

This is not a small question!  (Actually, depending on the interpretation of 
ParamType[List, String], it’s one of two questions; I’ll answer them both.)

What does ParameterizedType[LFoo, String] mean?  Could be one of three things.

1.  Specialize Foo with T=String; this produces a fully reified Foo<String>.  
2.  Recognize that String is a ref type, and produce an erased Foo<String>.  
3.  Recognize that String is a ref type, and produce an erased Foo<String>, but 
with metadata that allows the types to be recovered through reflection.  

If your interpretation is #1 (which is what our interpretation is), then your 
question is: Why not “just” do reified generics?  

Alternately, your question might be: why not do #2 or #3, and retain the type 
information for longer, to expand the range of implementation choices.  I’ll 
answer this one first.  We’d like to minimize the intrusion of Java’s generic 
type system on the JVM.  Rules like “these types are erased, but these types 
are reified” are choices that should be left to the language compiler.  Just 
because Java decides to erase, doesn’t mean Kotlin should be required to; you 
should have the choice.  And this simplifies the VM implementation too — the 
language compiler asks for erasure or reification, and the VM responds 
accordingly.  (I don’t think this is your question, and I suspect you agree 
with all this.)

Another thing you could be asking is: why does the VM need to know about 
“erased” at all?  And the reason here is fairly simple (if unfortunate); 
erasure is noncompositional enough that the compiler cannot simply erase early 
and ask the VM to propagate and substitute thereafter; doing so would lead to 
incompatible translations, and we take it as a requirement that we be 
compatible with existing uses of reference generics.  It took us a long time to 
come to such a simple model for how to capture erasure!  

Which brings us to the question that I think is your real question: given that 
we now *can* reify generics over references types, why wouldn’t we always do 
so?  There are many reasons, including compatibility, expressibility, and 
footprint.  

Compatibility.  If we “just” reified List<String>, then existing code would be 
neither source- nor binary- compatible.  (When .NET switched to reified 
generics, you had to switch all of your libraries from the old libraries to the 
new reified libraries.)  That’s a non-starter for us; existing uses of generic 
classes (both clients and subclasses) should be source and binary compatible 
after the classes are anyfied.  (Additionally, plenty of code has assumptions 
about the result of reflective operations like .getClass() on generics, that 
could break if we reified all reference parameterizations.)  That means that 
reference instantiations need to continue to be erased.  

Some may have a hard time with this conclusion.  If you dig at this unease, I 
think the most likely explanation is the assumption that “well, reified 
generics are just better!”  But this isn’t true — both erasure and reification 
have pros and cons.  Erasure was not a “mistake” to be fixed by reification; it 
is a compromise, and I think a highly pragmatic one.  

(Some may ask “could we make reification an option, say at use site (e.g., “new 
List<reified String>”.)  We could, but I suspect that having a mix of reified 
List<String> and erased List<String> coexisting in the same heap would be an 
endless source of bugs and corner cases.)

Expressibility.  Our preference for erasure is not simply based on 
compatibility.  Real-world generic code is full of “dirty tricks” that involve 
casting through raw; sometimes this is just sloppiness or lack of expertise 
with generics, but sometimes this is the only practical way to achieve the 
desired result without incurring massive copying costs.  Truly reifying 
generics would mean that all this code would break and have to be rewritten.  

Footprint.  Erasure means that we can share a single class to represent all 
instantiations of a type; Map<String, String>, Map<Foot, Shoe>, etc.  Having 
separate types for each of these would involve more class loading, more class 
metadata,etc. Yes, there are techniques for minimizing this (.NET reifies a 
parameterization token but erases at code-gen time), but there is some cost.  

My point is simply, reification is far, far from free, and erasure is not 
simply a mistake or a hack to be undone at the first opportunity.  

So, to answer your direct question: the java compiler chooses to represent 
List<String> as ParamType[List, erased], rather than ParamType[List, String], 
for the above reasons — but Kotlin could make the opposite choice (at some Java 
interop cost.)  

On Feb 1, 2016, at 5:33 AM, Andrey Breslav <[email protected]> wrote:

> A question about these examples:
> R(Foo<raw>) = Class["Foo"] or ParameterizedType['L', "Foo", "_"]
> R(Foo<String>) = Class["Foo"] orParameterizedType['L', "Foo", "_"]
> R(Foo<int[]>) =ParameterizedType['L', "Foo", ArrayType[1, "I"]]
> Apparently, we want to preserve the information about int[], while we don't 
> care about String. Why? Isn't int[] just a class, like String?
> 
> On Fri, Jan 22, 2016 at 7:53 PM Brian Goetz <[email protected]> wrote:
> Please find a document here:
> 
> http://cr.openjdk.java.net/~briangoetz/valhalla/eg-attachments/model3-01.html
> 
> that describes our current thinking for evolving the classfile format to
> clearly and efficiently represent parametric polymorphism.  The early
> concepts of this approach were outlined in my talk at JVMLS last year;
> this represents a refinement of those ideas, and a reasonable "stake in
> the ground" description of what seems the most sensible way to balance
> preserving parametric information in the classfile without imposing
> excessive runtime costs for loading specializations.
> 
> We're working on an updated compiler prototype which people will be able
> to play with soon (along with a formal model.)
> 
> Please ask questions!
> 
> Some things this document does not address yet:
>   - How we deal with types implicit in the bytecodes (aload vs iload)
> and how they get specialized;
>   - How we represent restricted methods in the classfile;
>   - How we represent the wildcard type Foo<any>
> 
> 
> -- 
> Andrey Breslav
> Project Lead of Kotlin
> JetBrains
> http://kotlinlang.org/
> The Drive to Develop

Re: Model 3 classfile design document

Reply via email to