Will the users be able to write their own <sclinit>?
  • class Foo {
    • __species {
      • ...
      }
    }
 
Your access bridge solution using species methods looks fine, but are we not solving that with nest mates?
 
I'm also wondering whether the following are typos, or if I misunderstood them:
  • TestResolution.m_I() was not meant to be decorated with '__species'
  • TestForwardRef2.s1_S and TestForwardRef2.s2_SS don't have the correct modifiers, or should not be error cases.
  • TestTypeVar<X>.m_I() was not meant to be decorated with '__species'
 
--
Bjørn Vårdal
IBM Runtimes
 
----- Original message -----
From: Maurizio Cimadamore <[email protected]>
Sent by: "valhalla-spec-experts" <[email protected]>
To: [email protected]
Cc:
Subject: species static prototype
Date: Fri, May 27, 2016 4:56 PM
 
Hi,
over the last few days I've been busy putting together a prototype [1, 2] of javac/runtime support for species static. I guess it could be considered an prototype implementation of the approach that Bjorn has described as "Repurpose existing statics" [4] in his nice writeup. Here's what I have learned during the experience.

Parser
====

The prototype uses a no fuss approach where '__species' is the modifier to denote species static stuff (of course a better syntax will have to be picked at some point, but that's not the goal of the current exercise). This means you can write:

class Foo<X> {
   String i; //instance field
   static String s; //static field
   __species String ss; //species static field
}

This is obviously good enough for the time being.

A complication with parsing occurs when accessing species members; in fact, species members can be accessed via their fully qualified type (including all required type-arguments, if necessary).

Foo<String>.ss;
Foo<int>.ss;

The above are all valid species access _expression_. Now, adding this kind of support in the parser is always tricky - as we have to battle with ambiguities which might pop up. Luckily, this pattern is similar enough to the one we use for method references - i.e. :

Foo<String>::ss

Which the compiler already had to special case; so I ended up slightly generalizing what we did in JDK 8 method reference parsing, and I got something working reasonably quick. But this could be an area where coming up with a clean spec might be tricky (as the impl uses abundant lookahead to disambiguate this one).

Resolution
======

The basic idea is to divide the world in three static levels, whose properties are summarized in the table below:
 
  enclosing type enclosing instance
instance yes yes
species yes no
static no no

So, in terms of who can access what, it follows that if we consider 'instance' to be the highest static level and 'static' to be the lowest, then it's ok for a member with static level S1 to access another member of static level S2 provided that S1 >= S2. Or, with a table:
 
from/to instance species static
instance yes yes yes
species no yes yes
static no no yes


So, let's look at a concrete example:

class TestResolution {
    static void m_S() {
        m_S(); //ok
        m_SS(); //error
        m_I(); //error
    }

    __species void m_SS() {        
        m_S(); //ok
        m_SS(); //ok

        m_I(); //error    
    }

   
    __species void m_I() {
        m_S(); //ok
        m_SS(); //ok
        m_I(); //ok
    }

}

A crucial property, of course, is that species static members can reference to any type vars in the enclosing context:

class TestTypeVar<X> {
    static void m_S() {
        X x; //error
    }

    __species void m_SS() {
         X x; //ok    
    }

    __species void m_I() {
         X x; //ok
    }

}

Nesting
=====

Another concept that needs generalization is that of allowed nesting; consider the following program:

class TestNesting1 {
    class MemberInner {
        static String s_S; //error
        String s_I; //ok
    }

    static class StaticInner {
        static String s_S; //ok
        String s_I; //ok
    }
}

That is, the compiler will only allow you to declare static members in toplevel classes or in static nested classes (which, after all, act as toplevel classes). Now that we are adding a new static level to the picture, how are the nesting rules affected?

Looking at the table above, if we consider 'instance' to be the highest static level and 'static' to be the lowest, then it's ok for a member with static level S1 to declare a member of static level S2 provided that S1 <= S2. Again, we can look at this in a tabular fashion:
 
declaring/declared instance species static
instance yes no no
species yes yes no
static yes yes yes

This also seems like a nice generalization of the current rules. The rationale behind these rules is to  basically, guarantee some invariants during member lookup; let's say that we are in a nested class with static level S1 - then, by the rule above, it follows that any member nested in this class will be able to access another member with static level S1 declared in this class or in any lexically enclosing class.

A full example of nesting rules is given below:

class TestNesting2 {
    class MemberInner {
        static String s_S; //error
        __species String s_SS; //error
        String s_I; //ok
    }

   
    __species class StaticInner {
        static String s_S; //error
        __species String s_SS; //ok
        String s_I; //ok
    }

    static class StaticInner {
        static String s_S; //ok       
        __species String s_SS; //ok
        String s_I; //ok
    }
}

Unchecked access
===========

Because of an unfortunate interplay between species and erasure, code using species members is potentially unsound (the example below is a variation of an example first discovered by Peter's example [3] in this very mailing list):

public class Foo<any T> {
    __species T cache;
}


Foo<String>.cache = "Hello";
Integer i = Foo<Integer>.cache; //whoops


To prevent cases like these, the compiler implements a check which looks at the qualifier of a species access; if such qualifier (either explicit, or implicit) cannot be proven to be reifiable, an unchecked warning is issued.

Note that it is possible to restrict such warnings only to cases where the signature of the accessed species static member changes under erasure. E.g. in the above example, accessing 'cache' is unchecked, because the type of 'cache' contains type-variables; but if another species static field was accessed whose type did not depend on type-variables, then the access should be considered sound.


Species initializers
===========

In our model we have three static levels - but we have initialization artifacts for only two of those; we need to fix that:
 
instance <init>
species <sclinit>
static <clinit>


That is, a new <sclinit> method is added to a class containing one or more species variables with an initializer. This method is used to hoist the initialization code for all the species variables.

Forward references
============

Rules for detecting forward references have to be extended accordingly. A forward reference occurs whenever there's an attempt to reference a variable from a position P, where the variable declaration occurs in a position P' > P. Currently, the rules for forward references allow an instance variable to forward-reference a static variable - as shown below:

class TestForwardRef {
   String s = s_S;
   static String s_S = "Hello!";
}

The rationale behind this is that, by the time we see the instance initializer for 's' we would have already executed the code for initializing 's_S' (as initialization will occur in different methods, <init> and <clinit> respectively, see section above). With the new static level, the forward reference rules have to be redefined according to the table below:

 
from/to instance species static
instance forward ref ok ok
species illegal forward ref ok
static illegal illegal forward ref

In other words, it's ok to forward reference a variable whose static level is lower than that available where the reference occurs. An example is given below:

class TestForwardRef2 {
   String s1_I = s_S; //ok
   String s2_I = s_SS; //ok

   String s1_S = s_S; //error!

   String s1_SS = s_S; //ok
   String s2_SS = s_SS; //error!
  
   static String s_S = "Hello!";
   __species String s_SS = "Hello Species!";
}

This is an extension of the above principle: since instance variables are initialized in <init>, they can reference variables initialized in <clinit> or <sclinit>. If a variable is initialized in <sclinit> it can similarly safely reference a variable initialized in <clinit>. Another way to think of this is that a forward reference error only occurs if the static level of the referenced symbol is the same as the static level where the reference occurs. All other cases are either illegal (i.e. because it's an attempt to go from a lower static level to an higher one) or valid (because it can be guaranteed that the code initializing the referenced variable has already been executed).

Code generation
==========

Javac currently emits invokestatic/getstatic/putstatic for both legacy static and species static access. javac will use the 'owner' field of a CONSTANT_MethodRef, CONSTANT_FieldRef constants to point to the sharp type of the species access (through a constant pool type entry). Static access will always see an erased owner.

Consider this example:

class TestGen<any X> {
   __species void m_SS() { }
   static void m_S() { }

   public static void main(String args) {
       TestGen<String>.m_SS();
       TestGen<int>.m_SS();
       TestGen<String>.m_S();
       TestGen<int>.m_S();
   }
}

The generated code in the 'main' method is reported below:

0: invokestatic  #11                 // Method TestGen<_>.m_SS:()V
3: invokestatic  #15                 // Method TestGen<I>.m_SS:()V
6: invokestatic  #18                 // Method TestGen<_>.m_S:()V
9: invokestatic  #18                 // Method TestGen<_>.m_S:()V

As it can be seen, species static access can cause a sharper type to end up in the 'owner' field of the member reference info; on the other hand, a static access always lead to an erased 'owner'.

Another detail worth mentioning is how __species is represented in the bytecode. Given the current lack of flags bit I've opted to use the last remaining bit 0x8000 - this is in fact the last unused bit that can be shared across class, field and method descriptors. Actually, this bit has already been used to encode the ACC_MANDATED flag in the MethodParameters attribute (as of JDK 8) - but since there's no other usage of that flag configuration outside MethodParameters it would seem safe to recycle it. Of course more compact approaches are also possible, but they would lead to different flag configurations for species static fields, methods and classes.

Specialization
=========

Specializing species access is relatively straightforward:

* both instance and species static members are copied in the specialization
* static members are only copied in the erased specialization (and skipped otherwise)
* ACC_SPECIES classes become regular classes when specialized
* ACC_SPECIES methods/fields become static methods/fields in the specialization
* <sclinit> becomes the new <clinit> in the specialization (and is omitted if the specialization is the erased specialization)

The last bullet requires some extra care when handling the 'erased' specialization; consider the following example:

class TestSpec<any X> {
   static String s_S = "HelloStatic";
   __species String s_SS = "HelloSpecies";
}

This class will end up with the following two synthetic methods:

static void <clinit>();
    descriptor: ()V
    flags: ACC_STATIC
    Code:
      stack=1, locals=0, args_size=0
         0: ldc           #8                  // String HelloStatic
         2: putstatic     #14                 // Field s_S:Ljava/lang/String;
         5: ldc           #16                 // String HelloSpecies
         7: putstatic     #19                 // Field s_SS:Ljava/lang/String;
        10: return

  species void <sclinit>();
    descriptor: ()V
    flags: ACC_SPECIES
    Code:
      stack=1, locals=1, args_size=1
         0: ldc           #16                 // String HelloSpecies
         2: putstatic     #19                 // Field s_SS:Ljava/lang/String;
         5: return

As it can be seen, the <clinit> method contains initialization code for both static and species static fields! To understand why this is so, let's consider how the specialized bits might be derived from the template class following the rules above. Let's consider a specialization like TestSpec<int>: in this case, we need to drop <clinit> (it's a static method and TestSpec<int> is not an erased specialization), and we also need to rename <sclinit> as <clinit> in the new specialization. All is fine - the specialization will contain the relevant code required to initialize its species static fields.

Let's now turn to the erased specialization TestSpec<_> - this specialization receives both static and species static members. Now, if we were to follow the same rules for initializers, we'd end up with two different initializer methods - both <clinit> and <sclinit>. We could ask the specializer to merge them somehow, but that would be tricky and expensive. Instead, we simply (i) drop <sclinit> from the erased specialization and (ii) retain <clinit>. Of course this means that <clinit> must also contain initialization code for species static members.

Bonus point: Generic methods
===================

As pointed out by Brian, if we have species static classes we can translate static and species static specializable generic methods quite effectively. Consider this example:

class TestGenMethods {
   static <any X> void m(X x) { ... }

   void test() {
       m(42);
   }
}

without species static, this would translate to:

class TestGenMethods {
    static class TestGenMethods$m<any X> {
         void m(X z) { ... }
    }

    /* bridge */ void m(Object o) { new TestGenMethods$m().m(o); }

    void test() {
        new TestGenMethod$m<int>().m(42); // this is really done inside the BSM
    }
}

Note how the bridge (called by legacy code) will need to spin a new instance of the synthetic class and then call a method on it. The bootstrap used to dispatch static generic specializable calls also needs to do a very similar operation. But what if we turned the translated generic method into a species static method?

class TestGenMethods {
    class TestGenMethods$m<any X> {
         __species void m(X z) { ... }
    }

    /* bridge */ void m(Object o) { TestGenMethods$m.m(o); }

    void test() {
        TestGenMethod$m<int>.m(42); // this is really done inside the BSM
    }
}

With species static, we can now access the method w/o needing any extra instance. This leads to simplification in both the bridging strategy and the bootstrap implementation. We can apply a similar simplification for dispatch of specializable species static calls - the only difference is that the synthetic holder class has also to be marked as species static (since it could access type-vars from the enclosing context).

Bonus point: Access bridges
=================

Access bridges are a constant pain in the current translation strategy; such bridges are generated by the compiler to grant access to otherwise inaccessible members. Example:

class Outer<any X> {
    private void m() { }

    class Inner {
        void test() {
            m();
        }
    }
}

This code will be translated as follows:

class Outer<any X> {

    /* synthetic */ static access$m(Outer o) { o.m(); }

    private void m() { }

    class Inner {
        /*synthetic*/ Outer this$0;

        void test() {
            access$m(this$0);
        }
    }
}

That is, access to private members is translated with an access to an accessor bridge, which then performs access from the right location. Note that the accessor bridge is static (because otherwise it would be possible to maliciously override it to grant access to otherwise inaccessible members); since it's static, usual rules apply, so it cannot refer to type-variables, it cannot be specialized, etc. This means that there are cases with specialization where existing access bridge are not enough to guarantee access - if the access happens to cross specialization boundaries (i.e. accessing m() from an Outer<int>.Inner).

Again, species static comes to the rescue:

class Outer<any X> {

    /* synthetic */ __species access$m(Outer<X> o) { o.m(); }

    private void m() { }

    class Inner {
        /*synthetic*/ Outer this$0;

        void test() {
            Outer<X>.access$m(this$0);
        }
    }
}

Since the accessor bridge is now species static, it means it can now mention type variables (such as X); and it also means that when the bridge is accessed (from Inner), the qualifier type (Outer<X>) is guaranteed to remain sharp from the source code to the bytecode - which means that when this code will get specialized, all references to X will be dealt with accordingly (and the right accessor bridge will be accessed).

Parting thoughts
==========

On many levels, species statics seem to be the missing ingredient for implementing many of the tricks of our translation strategy, as well as to make it easier to express common idioms (i.e. type-dependent caches) in user code.

Adding support for species static has proven to be harder than originally thought. This is mainly because the current world is split in two static levels: static and instance. When something is not static it's implicitly assumed to be instance, and viceversa. If we add a third static level to the picture, a lot of the existing code just doesn't work anymore, or has to be validated to check as to whether 'static' means 'legacy static' or 'species static' (or both).

I started the implementation by treating static, species static and instance as completely separate static levels - with different internal flags, etc. but I soon realized that, while clean, this approach was invalidating too much of the existing implementation. More specifically, all the code snippets checking for static would now have been updated to check for static OR species static (overriding vs. hiding, access to 'this', access to 'super', generic bridges, ...). On the other hand, the places where the semantics of species static vs. static was different were quite limited:

* membership/type substitution: a species static behaves like an instance member; the type variables of the owner are replaced into the member signature.
* resolution: we need to implement the correct access rules as shown in the tables above.
* code generation: an invokestatic involving a species static gets a sharp qualifier type

This quickly led to the realization that it was instead easier to just treat 'species static' as a special case of 'static' - and then to add finer grained logic whenever we really needed the distinction. This led to a considerably easier patch, and I think that a similar consideration will hold for the JLS.

[1] - http://hg.openjdk.java.net/valhalla/valhalla/langtools/rev/6949c3d06e8f
[2] - http://hg.openjdk.java.net/valhalla/valhalla/jdk/rev/836efde938c1
[3] - http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/2016-February/000096.html
[4] - http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/2016-May/000147.html

Maurizio


 
 

Reply via email to