Re: species static prototype

Maurizio Cimadamore Wed, 01 Jun 2016 13:20:18 -0700


On 01/06/16 19:52, Bjorn B Vardal wrote:

Will the users be able to write their own <sclinit>?

  * class Foo {
      o __species {
          + ...
        }
    }

Hi Bjorn,

Yep - that is supported.

Your access bridge solution using species methods looks fine, but arewe not solving that with nest mates?I'm also wondering whether the following are typos, or if Imisunderstood them:
  * TestResolution.m_I() was not meant to be decorated with '__species'

Right - that's a type, the 'species' modifier was meant to be omitted(i.e. it's an instance method)


  * TestForwardRef2.s1_S and TestForwardRef2.s2_SS don't have the
    correct modifiers, or should not be error cases.

Yeah - missing static and species there - in general members with _S aremeant to be static, those with _SS are meant to be 'species'


  * TestTypeVar<X>.m_I() was not meant to be decorated with '__species'

Yep - same as above

Sorry for the typos!

Maurizio

--
Bjørn Vårdal
IBM Runtimes

    ----- Original message -----
    From: Maurizio Cimadamore <[email protected]>
    Sent by: "valhalla-spec-experts"
    <[email protected]>
    To: [email protected]
    Cc:
    Subject: species static prototype
    Date: Fri, May 27, 2016 4:56 PM

    Hi,
    over the last few days I've been busy putting together a prototype
    [1, 2] of javac/runtime support for species static. I guess it
    could be considered an prototype implementation of the approach
    that Bjorn has described as "Repurpose existing statics" [4] in
    his nice writeup. Here's what I have learned during the experience.

    Parser
    ====

    The prototype uses a no fuss approach where '__species' is the
    modifier to denote species static stuff (of course a better syntax
    will have to be picked at some point, but that's not the goal of
    the current exercise). This means you can write:

    class Foo<X> {
       String i; //instance field
       static String s; //static field
       __species String ss; //species static field
    }

    This is obviously good enough for the time being.

    A complication with parsing occurs when accessing species members;
    in fact, species members can be accessed via their fully qualified
    type (including all required type-arguments, if necessary).

    Foo<String>.ss;
    Foo<int>.ss;

    The above are all valid species access expression. Now, adding
    this kind of support in the parser is always tricky - as we have
    to battle with ambiguities which might pop up. Luckily, this
    pattern is similar enough to the one we use for method references
    - i.e. :

    Foo<String>::ss

    Which the compiler already had to special case; so I ended up
    slightly generalizing what we did in JDK 8 method reference
    parsing, and I got something working reasonably quick. But this
    could be an area where coming up with a clean spec might be tricky
    (as the impl uses abundant lookahead to disambiguate this one).

    Resolution
    ======

    The basic idea is to divide the world in three static levels,
    whose properties are summarized in the table below:
        enclosing type  enclosing instance
    instance    yes     yes
    species     yes     no
    static      no      no


    So, in terms of who can access what, it follows that if we
    consider 'instance' to be the highest static level and 'static' to
    be the lowest, then it's ok for a member with static level S1 to
    access another member of static level S2 provided that S1 >= S2.
    Or, with a table:
    from/to     instance        species         static
    instance    yes     yes     yes
    species     no      yes     yes
    static      no      no      yes



    So, let's look at a concrete example:

    class TestResolution {
        static void m_S() {
            m_S(); //ok
            m_SS(); //error
            m_I(); //error
        }

        __species void m_SS() {
            m_S(); //ok
            m_SS(); //ok
            m_I(); //error
        }

        __species void m_I() {
            m_S(); //ok
            m_SS(); //ok
            m_I(); //ok
        }
    }

    A crucial property, of course, is that species static members can
    reference to any type vars in the enclosing context:

    class TestTypeVar<X> {
        static void m_S() {
            X x; //error
        }

        __species void m_SS() {
             X x; //ok
        }
        __species void m_I() {
             X x; //ok
        }
    }

    Nesting
    =====

    Another concept that needs generalization is that of allowed
    nesting; consider the following program:

    class TestNesting1 {
        class MemberInner {
            static String s_S; //error
            String s_I; //ok
        }

        static class StaticInner {
            static String s_S; //ok
            String s_I; //ok
        }
    }

    That is, the compiler will only allow you to declare static
    members in toplevel classes or in static nested classes (which,
    after all, act as toplevel classes). Now that we are adding a new
    static level to the picture, how are the nesting rules affected?

    Looking at the table above, if we consider 'instance' to be the
    highest static level and 'static' to be the lowest, then it's ok
    for a member with static level S1 to declare a member of static
    level S2 provided that S1 <= S2. Again, we can look at this in a
    tabular fashion:
    declaring/declared  instance        species         static
    instance    yes     no      no
    species     yes     yes     no
    static      yes     yes     yes


    This also seems like a nice generalization of the current rules.
    The rationale behind these rules is to  basically, guarantee some
    invariants during member lookup; let's say that we are in a nested
    class with static level S1 - then, by the rule above, it follows
    that any member nested in this class will be able to access
    another member with static level S1 declared in this class or in
    any lexically enclosing class.

    A full example of nesting rules is given below:

    class TestNesting2 {
        class MemberInner {
            static String s_S; //error
            __species String s_SS; //error
            String s_I; //ok
        }


        __species class StaticInner {
            static String s_S; //error
            __species String s_SS; //ok
            String s_I; //ok
        }

        static class StaticInner {
            static String s_S; //ok
            __species String s_SS; //ok
            String s_I; //ok
        }
    }

    Unchecked access
    ===========

    Because of an unfortunate interplay between species and erasure,
    code using species members is potentially unsound (the example
    below is a variation of an example first discovered by Peter's
    example [3] in this very mailing list):

    public class Foo<any T> {
        __species T cache;
    }


    Foo<String>.cache = "Hello";
    Integer i = Foo<Integer>.cache; //whoops

    To prevent cases like these, the compiler implements a check which
    looks at the qualifier of a species access; if such qualifier
    (either explicit, or implicit) cannot be proven to be reifiable,
    an unchecked warning is issued.

    Note that it is possible to restrict such warnings only to cases
    where the signature of the accessed species static member changes
    under erasure. E.g. in the above example, accessing 'cache' is
    unchecked, because the type of 'cache' contains type-variables;
    but if another species static field was accessed whose type did
    not depend on type-variables, then the access should be considered
    sound.


    Species initializers
    ===========

    In our model we have three static levels - but we have
    initialization artifacts for only two of those; we need to fix that:
    instance    <init>
    species     <sclinit>
    static      <clinit>



    That is, a new <sclinit> method is added to a class containing one
    or more species variables with an initializer. This method is used
    to hoist the initialization code for all the species variables.

    Forward references
    ============

    Rules for detecting forward references have to be extended
    accordingly. A forward reference occurs whenever there's an
    attempt to reference a variable from a position P, where the
    variable declaration occurs in a position P' > P. Currently, the
    rules for forward references allow an instance variable to
    forward-reference a static variable - as shown below:

    class TestForwardRef {
       String s = s_S;
       static String s_S = "Hello!";
    }

    The rationale behind this is that, by the time we see the instance
    initializer for 's' we would have already executed the code for
    initializing 's_S' (as initialization will occur in different
    methods, <init> and <clinit> respectively, see section above).
    With the new static level, the forward reference rules have to be
    redefined according to the table below:

    from/to     instance        species         static
    instance    forward ref     ok      ok
    species     illegal         forward ref     ok
    static      illegal         illegal         forward ref


    In other words, it's ok to forward reference a variable whose
    static level is lower than that available where the reference
    occurs. An example is given below:

    class TestForwardRef2 {
       String s1_I = s_S; //ok
       String s2_I = s_SS; //ok

       String s1_S = s_S; //error!

       String s1_SS = s_S; //ok
       String s2_SS = s_SS; //error!

    static String s_S = "Hello!";
       __species String s_SS = "Hello Species!";
    }

    This is an extension of the above principle: since instance
    variables are initialized in <init>, they can reference variables
    initialized in <clinit> or <sclinit>. If a variable is initialized
    in <sclinit> it can similarly safely reference a variable
    initialized in <clinit>. Another way to think of this is that a
    forward reference error only occurs if the static level of the
    referenced symbol is the same as the static level where the
    reference occurs. All other cases are either illegal (i.e. because
    it's an attempt to go from a lower static level to an higher one)
    or valid (because it can be guaranteed that the code initializing
    the referenced variable has already been executed).

    Code generation
    ==========

    Javac currently emits invokestatic/getstatic/putstatic for both
    legacy static and species static access. javac will use the
    'owner' field of a CONSTANT_MethodRef, CONSTANT_FieldRef constants
    to point to the sharp type of the species access (through a
    constant pool type entry). Static access will always see an erased
    owner.

    Consider this example:

    class TestGen<any X> {
       __species void m_SS() { }
       static void m_S() { }

       public static void main(String args) {
           TestGen<String>.m_SS();
           TestGen<int>.m_SS();
           TestGen<String>.m_S();
           TestGen<int>.m_S();
       }
    }

    The generated code in the 'main' method is reported below:

    0: invokestatic  #11                 // Method TestGen<_>.m_SS:()V
    3: invokestatic  #15                 // Method TestGen<I>.m_SS:()V
    6: invokestatic  #18                 // Method TestGen<_>.m_S:()V
    9: invokestatic  #18                 // Method TestGen<_>.m_S:()V

    As it can be seen, species static access can cause a sharper type
    to end up in the 'owner' field of the member reference info; on
    the other hand, a static access always lead to an erased 'owner'.

    Another detail worth mentioning is how __species is represented in
    the bytecode. Given the current lack of flags bit I've opted to
    use the last remaining bit 0x8000 - this is in fact the last
    unused bit that can be shared across class, field and method
    descriptors. Actually, this bit has already been used to encode
    the ACC_MANDATED flag in the MethodParameters attribute (as of JDK
    8) - but since there's no other usage of that flag configuration
    outside MethodParameters it would seem safe to recycle it. Of
    course more compact approaches are also possible, but they would
    lead to different flag configurations for species static fields,
    methods and classes.

    Specialization
    =========

    Specializing species access is relatively straightforward:

    * both instance and species static members are copied in the
    specialization
    * static members are only copied in the erased specialization (and
    skipped otherwise)
    * ACC_SPECIES classes become regular classes when specialized
    * ACC_SPECIES methods/fields become static methods/fields in the
    specialization
    * <sclinit> becomes the new <clinit> in the specialization (and is
    omitted if the specialization is the erased specialization)

    The last bullet requires some extra care when handling the
    'erased' specialization; consider the following example:

    class TestSpec<any X> {
       static String s_S = "HelloStatic";
       __species String s_SS = "HelloSpecies";
    }

    This class will end up with the following two synthetic methods:

    static void <clinit>();
        descriptor: ()V
        flags: ACC_STATIC
        Code:
          stack=1, locals=0, args_size=0
             0: ldc           #8                  // String HelloStatic
             2: putstatic     #14                 // Field
    s_S:Ljava/lang/String;
             5: ldc           #16                 // String HelloSpecies
             7: putstatic     #19                 // Field
    s_SS:Ljava/lang/String;
            10: return

      species void <sclinit>();
        descriptor: ()V
        flags: ACC_SPECIES
        Code:
          stack=1, locals=1, args_size=1
             0: ldc           #16                 // String HelloSpecies
             2: putstatic     #19                 // Field
    s_SS:Ljava/lang/String;
             5: return

    As it can be seen, the <clinit> method contains initialization
    code for both static and species static fields! To understand why
    this is so, let's consider how the specialized bits might be
    derived from the template class following the rules above. Let's
    consider a specialization like TestSpec<int>: in this case, we
    need to drop <clinit> (it's a static method and TestSpec<int> is
    not an erased specialization), and we also need to rename
    <sclinit> as <clinit> in the new specialization. All is fine - the
    specialization will contain the relevant code required to
    initialize its species static fields.

    Let's now turn to the erased specialization TestSpec<_> - this
    specialization receives both static and species static members.
    Now, if we were to follow the same rules for initializers, we'd
    end up with two different initializer methods - both <clinit> and
    <sclinit>. We could ask the specializer to merge them somehow, but
    that would be tricky and expensive. Instead, we simply (i) drop
    <sclinit> from the erased specialization and (ii) retain <clinit>.
    Of course this means that <clinit> must also contain
    initialization code for species static members.

    Bonus point: Generic methods
    ===================

    As pointed out by Brian, if we have species static classes we can
    translate static and species static specializable generic methods
    quite effectively. Consider this example:

    class TestGenMethods {
       static <any X> void m(X x) { ... }

       void test() {
           m(42);
       }
    }

    without species static, this would translate to:

    class TestGenMethods {
        static class TestGenMethods$m<any X> {
             void m(X z) { ... }
        }

        /* bridge */ void m(Object o) { new TestGenMethods$m().m(o); }

        void test() {
            new TestGenMethod$m<int>().m(42); // this is really done
    inside the BSM
        }
    }

    Note how the bridge (called by legacy code) will need to spin a
    new instance of the synthetic class and then call a method on it.
    The bootstrap used to dispatch static generic specializable calls
    also needs to do a very similar operation. But what if we turned
    the translated generic method into a species static method?

    class TestGenMethods {
        class TestGenMethods$m<any X> {
             __species void m(X z) { ... }
        }

        /* bridge */ void m(Object o) { TestGenMethods$m.m(o); }

        void test() {
            TestGenMethod$m<int>.m(42); // this is really done inside
    the BSM
        }
    }

    With species static, we can now access the method w/o needing any
    extra instance. This leads to simplification in both the bridging
    strategy and the bootstrap implementation. We can apply a similar
    simplification for dispatch of specializable species static calls
    - the only difference is that the synthetic holder class has also
    to be marked as species static (since it could access type-vars
    from the enclosing context).

    Bonus point: Access bridges
    =================

    Access bridges are a constant pain in the current translation
    strategy; such bridges are generated by the compiler to grant
    access to otherwise inaccessible members. Example:

    class Outer<any X> {
        private void m() { }

        class Inner {
            void test() {
                m();
            }
        }
    }

    This code will be translated as follows:

    class Outer<any X> {

        /* synthetic */ static access$m(Outer o) { o.m(); }

        private void m() { }

        class Inner {
            /*synthetic*/ Outer this$0;

            void test() {
                access$m(this$0);
            }
        }
    }

    That is, access to private members is translated with an access to
    an accessor bridge, which then performs access from the right
    location. Note that the accessor bridge is static (because
    otherwise it would be possible to maliciously override it to grant
    access to otherwise inaccessible members); since it's static,
    usual rules apply, so it cannot refer to type-variables, it cannot
    be specialized, etc. This means that there are cases with
    specialization where existing access bridge are not enough to
    guarantee access - if the access happens to cross specialization
    boundaries (i.e. accessing m() from an Outer<int>.Inner).

    Again, species static comes to the rescue:

    class Outer<any X> {

        /* synthetic */ __species access$m(Outer<X> o) { o.m(); }

        private void m() { }

        class Inner {
            /*synthetic*/ Outer this$0;

            void test() {
                Outer<X>.access$m(this$0);
            }
        }
    }

    Since the accessor bridge is now species static, it means it can
    now mention type variables (such as X); and it also means that
    when the bridge is accessed (from Inner), the qualifier type
    (Outer<X>) is guaranteed to remain sharp from the source code to
    the bytecode - which means that when this code will get
    specialized, all references to X will be dealt with accordingly
    (and the right accessor bridge will be accessed).

    Parting thoughts
    ==========

    On many levels, species statics seem to be the missing ingredient
    for implementing many of the tricks of our translation strategy,
    as well as to make it easier to express common idioms (i.e.
    type-dependent caches) in user code.

    Adding support for species static has proven to be harder than
    originally thought. This is mainly because the current world is
    split in two static levels: static and instance. When something is
    not static it's implicitly assumed to be instance, and viceversa.
    If we add a third static level to the picture, a lot of the
    existing code just doesn't work anymore, or has to be validated to
    check as to whether 'static' means 'legacy static' or 'species
    static' (or both).

    I started the implementation by treating static, species static
    and instance as completely separate static levels - with different
    internal flags, etc. but I soon realized that, while clean, this
    approach was invalidating too much of the existing implementation.
    More specifically, all the code snippets checking for static would
    now have been updated to check for static OR species static
    (overriding vs. hiding, access to 'this', access to 'super',
    generic bridges, ...). On the other hand, the places where the
    semantics of species static vs. static was different were quite
    limited:

    * membership/type substitution: a species static behaves like an
    instance member; the type variables of the owner are replaced into
    the member signature.
    * resolution: we need to implement the correct access rules as
    shown in the tables above.
    * code generation: an invokestatic involving a species static gets
    a sharp qualifier type

    This quickly led to the realization that it was instead easier to
    just treat 'species static' as a special case of 'static' - and
    then to add finer grained logic whenever we really needed the
    distinction. This led to a considerably easier patch, and I think
    that a similar consideration will hold for the JLS.

    [1] -
    http://hg.openjdk.java.net/valhalla/valhalla/langtools/rev/6949c3d06e8f
    [2] -
    http://hg.openjdk.java.net/valhalla/valhalla/jdk/rev/836efde938c1
    [3] -
    
http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/2016-February/000096.html
    [4] -
    
http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/2016-May/000147.html

    Maurizio

Re: species static prototype

Reply via email to