binding to indy with an annotation-driven reweaver

John Rose Thu, 21 Jul 2016 19:42:07 -0700

Inspired in part by a recent exchange between Remi and Charlie[1],
I've been thinking recently, again, about binding Java APIs to indy.


[1]: https://groups.google.com/forum/#!topic/jvm-languages/IjIEzDc_d3U

I think I have a way to make it work, and (what is more)
I think the end result looks pretty good.  Even better,
along the way we can create a mechanism for naturally
constant-folding selected method calls (like List.of)
at link-time.  (Library defined constants ahead!)

There are two things which make all this hard.
First, indy calls its BSM at link time (for the particular
indy instruction, since the JVM is lazily linked),
but the Java language does not expose link-time operations,
except very indirectly (in <clinit> code, for example).

Second, many good indy use cases are at least partly
signature-polymorphic, just like method handles. 
But the Java language does not allow you to generify over
method type signatures (e.g., argument types of (), (int),
(int,int), (String), (String,int), etc., all in one API point).
Even Valhalla only lets you generify over one value at
a time.  (One step at a time!)

Note that these two hard points might occur at the same
time, since some interesting BSMs are in fact signature
polymorphic.  (Well, at least they are varargs methods,
and varargs is an OK substitute for S-P, at link time.)

OK, so we are reduced to baking special handling for those
sorts of things into the language (as MethodHandle and
VarHandle do), or passing some sort of smoke signal
through the language and reweaving the bytecodes
(as Remi is so good at).

How would we signal a BSM call, though Java, in a way
that a bytecode reweaver could recognize it?

Here's an answer, maybe the simplest answer:  Mark
some API points as BSMs (in disguise; don't let javac
know!).  When the reweaver encounters a BSM call in
bytecode, it has to collect all the operands and ensure
that they are constants.  If they are constants, then
the reweaver collects the whole call, and sticks it
into an auxiliary static method (or pattern-matches
the whole call into an indy bootstrap specifier, if
possible).  If they are not constants, it is a reweaving
error.

class IndyTrickster {
   enum StringIndexerKind { LENGTH, HASHCODE, CHAR0, PARSEINT };
   @IndyTricks.AtLinkTime
   static ToIntFunction<String> indexString(StringIndexerKind k) { … }
}

class IndyTrickUser {
   int foo(String s) {
     return indexString(LENGTH).applyAsInt(s);  // returns s.length()
   }
   int bad(String s, StringIndexerKind k) {
     return indexString(k).applyAsInt(s);  // ERROR in reweaver?
   }
}

This is really just a mechanism for materializing link-time constants,
which all by itself is pretty interesting.

I've chosen enums here, because (a) they are constants, but
(b) they cannot be directly supported (today) as indy static
arguments.  So the reweaver has to dump some code in an
auxiliary method somewhere, at least to materialize the enum.

(And see JDK-8161256, "general data in constant pools", for
a better way forward.)

We can stop here and use this trick to create APIs that materialize
constant values of type List, Map, etc.  Put the AtLinkTime
annotation on List.of, for example, et voila.

This raises the question, what happens if an operand fails to be
a link-time constant?  Should the reweaver silently keep the
call as-is, so that it runs every time, instead of just once at
link time?  Probably yes, but what happens when somebody
is expecting link-time folding, and wants to hear if it fails?

One answer:  An optional argument to the AtLinkTime annotation,
to determine what to do if the folding fails (error/warn/allow).
The List.of guys would just silently allow the non-folding uses,
since that is what they are used for now.

It would seem that link-time constants can only be produced by
static methods, but that would be wrong:  You can have link-time
computations that call non-static methods also, as long as the
receiver of each such method is itself a link-time constant.

(All of this suggests that the Java language should just have
a first-class notion of link-time constant, as it already has
compile-time constants.  But as we all know, Java grows
slowly and deliberately.  Experimenting first outside the JLS,
such as suggested here with reweavers, is a great way to
add weight to the case for change.)

What's next?  Well, so far the thing returned from the
constant-producing BSM has a type fully determined
by the static declaration of the BSM method.

You could grab some S-P magic from MethodHandles
to follow the BSM call by an S-P call, like this:

class IndyTrickster {
   @IndyTricks.AtLinkTime
   static MethodHandle doubler(Class<?> type) {
      return makeDoublerMH(type);
   }
   static MethodHandle makeDoublerMH(Class<?> type) {
      … // returns (type x) -> (String) Arrays.asList(x, x).toString()
   }
}

class IndyTrickUser {
   String foo1(String s) {
     return (String) doubler(String.class).invoke(s);
   }
   String foo2(int n) {
     return (String) doubler(int.class).invoke(n);
   }
}

Here the reweaver doesn't have to do anything special with
the MH.invoke.  But of course the BSM static argument
needs to be inferred; that's one of the things a BSM can
do which a mere constant-linker doesn't need to do.

So what sort of smoke-signal can we send to the reweaver
to tell it how to thread the necessary type information into
the BSM call?

Well, we can use more annotations to mark the "special"
BSM arguments that are filled in silently.  There are three
of them:  The Lookup, the method name, and the method
type.  In this case, we want to pass the method type,
which will be (in the previous examples) something like
(String)String or (int)String.

class IndyTrickster {
   @IndyTricks.AtLinkTime
   static MethodHandle doubler(
        @IndyTricks.MethodTypeArg MethodType mtype) {
      Class<?> type = mtype.parameterType(0);
      … // as before, using type
   }
   // fake generic entry point for javac to hit:
   @IndyTricks.AtLinkTime
   static MethodHandle doubler() {
      return doubler(methodType(String.class, Object.class));
   }
}

class IndyTrickUser {
   String foo1(String s) {
     return (String) doubler().invoke(s);
   }
   String foo2(int n) {
     return (String) doubler().invoke(n);
   }
}

What's the rule here?  Well, the reweaver looks for a
method call (of a normal method) immediately
on the result of the constant expression (a link-time
method call or chain of them).  It says to itself, "Hmm, if this
were an indy call, what would be the first three BSM
arguments?"  And if the actual link-time method
asks for one of those arguments, it will happily
supply it.  You ask for a method type argument
by annotating it as @MethodTypeArg, and similarly
for method name and lookup.

There needs to a pair of associated API points
here, one for javac to compile to, and one for
the reweaver to feed the extra argument to.
For now, let's just allow them to be overloadings
of the same name; we'll suggest something more
explicit later.

Let's try that with formatting, which uses varargs:

class IndyTrickster {
   @IndyTricks.AtLinkTime
   static MethodHandle formatterMH(
        @IndyTricks.MethodTypeArg MethodType mtype,
        String format) {
      Formatter.checkFormatArgTypes(format, mtype.parameterList());
      …
   }
   // fake generic entry point for javac to hit:
   @IndyTricks.AtLinkTime
   static MethodHandle formatterMH(String format) {
      … return something that doesn't check errors and accepts (Object...)
   }
}

class IndyTrickUser {
   String numberedLine(int lineno, String line) {
      return (String) formatterMH("%4d %s").invoke(lineno, line);
   }
   String bad(int lineno, String line) {
      return (String) formatterMH("%4d %s").invoke(line, lineno); // ERROR at 
link!
   }
}

Now the non-annotated BSM arguments look lonely.
Maybe there's another annotation, @LinkTimeArg,
for the other BSM static arguments.  Of course,
all arguments to an @AtLinkTime method are
link time arguments, aren't they?  Maybe not.

That leads to an extremely interesting question:
What would it mean to equip a link-time method
with a *mix* of link-time and run-time arguments?
This leads away from reweavers into language
design, but let's go there just for a second.

class IndyTrickster {
   // P-E-able version of String.matches:
   static String stringMatches(
        String string,
        static String regex) {  // IGNORE THIS BIKESHED
     // compile the regex, once at link time:
     Pattern p = Pattern.compile(regex);
     // link-time operations are all finished now!
     return p.matcher(string).matches();
   }
   // we'd want to do format like this, if we had Formatter.compile
   static String stringFormat(
        static String format,
        Object... args) { … }
}

class IndyTrickUser {
   boolean hasFoo(String s) {
     return stringMatches(s, "foo");
     // static arg is SECOND
   }
   String numberedLine(int lineno, String line) {
      return stringFormat("%4d %s", lineno, line);
     // static arg is FIRST
   }
}

Note that the stringMatches method wants to be split in half,
with the first half run (once) at link time, and the second run
(one or more times) each time the call to stringMatches is
executed.  Doing that for real will require some serious
cooperation with the JVM and JLS.  But can we fake it?

Yes it can be faked if we allow these tricky API points
to be associated together somehow (another annotation convention),
so that one API point accepts both static and regular arguments,
and the reweaver goes and finds the associated API point
(which the user doesn't call directly) that handles the
static arguments, and a third which handles the final call.

Like this:

class IndyTrickster {
   @IndyTricks.AtLinkTime(nonConstant=ALLOW, mixedArgs=true)
   static String stringMatches(
       @IndyTricks.RunTimeArg String string,
       @IndyTricks.LinkTimeArg String regex) {
     Pattern p = stringMatchesLinker(regex);
     return stringMatchesRunner(p, string);
   }
   @IndyTricks.AtLinkTime(linkerFor="stringMatches")
   static Pattern stringMatchesLinker(String regex) {
     return Pattern.compile(regex);
   }
   @IndyTricks.AtRunTime(runnerFor="stringMatches")
   static Pattern stringMatchesRunner(
       @IndyTricks.LinkTimeArg Pattern p,
       String string) {
     return p.matcher(string).matches();
   }
}

The rule is that the reweaver splits the nominal
call to stringMatches into a link-time call and
a run-time call.  The runtime takes one link-time
argument, the result (non-void) from the link-time
call.  That is simply a link-time constant, no different
from a constant list; in this case it is a compiled
regular expression pattern, but as we have seen
it can also be method handle.

(The "mixedArgs" argument isn't strictly necessary,
since the annotated arguments are obviously mixed.)

That works pretty well, except for the exposure of messy
extra API points.  But (and this part is pretty fun) those
extra API points can be made non-public, if we want.
After all, if the reweaver succeeds in converting the
call to indy, all it needs to link the instruction is a suitable
BSM.  That BSM needs some sort of handshake with
the defining class of stringMatches to produce the
method handles for the auxiliary functions.

(The method handles must be produced by privileged
code, but that privileged code can enforce rules
which prevent security holes.  For example,
methods outside of a class cannot nominate
themselves as helpers to a link-time function.
Or the link-time function can go the other way
and nominate its helpers directly, thereby
delegating their access to its clients.
I like the balanced look of the annotations
when the helper nominate themselves,
but it's a matter of taste.)

All right, so we can, with some setup work,
create API points which access a hybrid of
link-time and run-time processing.  What's
left?  The signature polymorphism, of course.

We already exhibited use of an S-P method
with this link-time APIs; it was MH.invoke.
Our options are limited here; the language
only recognizes a handful of methods for
this special treatment, and we don't want to
see their names scattered all over our code.

(At this point, I'd like to point out that, in theory,
*almost all* Java API points could be treated as
signature-polymorphic, as long as there were
for the JVM's link-resolver to do two things:
Record the exact intended method descriptor
to call, and arrange to adjust the caller's
descriptor to match the callee's.  Today
this is done by javac, which puts boxing,
unboxing, casting, and varargs instructions
around the call.  Tomorrow it could an indy
instruction which defers all the work to an
asType call at link time.)

Anyway, a reweaver can simulate the presence
of additional signature-polymorphic entry points
by *undoing* the argument matching code inserted
by javac, and passing the originally typed arguments
to an indy call, with a BSM which includes enough
smarts to identify the correct receiver (a MH constant
in the constant pool) and apply asType to that MH
constant to match to the caller's invocation type.

There's a difficulty here:  The reweaver can only
guess whether a given call to (say) Integer.valueOf
was really written by the user, or was inserted by
javac for autoboxing.  I think (but don't know)
that it is harmless to assume that they are all
"noise" calls that can be removed safely.

So, for the final act of this annotation extravaganza,
let's suppose there is an annotation which says
"do not change the original type of this argument".
This annotation would apply to return values
as well.  When applied to a varargs argument,
it would mean "don't make a varargs array,
just pass the original arguments, unchanged".
We will annotate such an argument as
"polymorphic".

(Such tricks are pointless unless there is
a way to replace the now-useless loosely typed
call by a method handle that accepts the exact
changed call type.  So we need another smoke
signal which tells the reweaver to completely
replace the original run-time call with a method
handle invocation.  We'll do this with an option
on the AtLinkTime annotations.)

Again, we need helpers.  Here's the doubler
example reworked with helpers and polymorphic
arguments:

class IndyTrickster {
   // fake generic entry point for javac to hit:
   @IndyTricks.AtLinkTime
   static String doubler(
        @IndyTricks.RuntimeArg(polymorphic=true) Object x) {
      return (String) doublerLinker(methodType(String.class, Object.class), 
x).invoke(x);
   }
   // it has help:
   @IndyTricks.AtLinkTime(linkerFor="doubler"),
        itsPolymorphicSoJustInvokeTheResult=true)
   static MethodHandle doublerLinker(
        @IndyTricks.MethodTypeArg MethodType mtype) {
      return makeDoublerMH(mtype.parameterType(0));
   }
   // there is no @AtRunTime; link time result is invoked
}

class IndyTrickUser {
   String foo1(String s) {
     return doubler(s);  // makes a constant MH, invokes it exactly
   }
   String foo2(int n) {
     return doubler(n);  // makes a constant MH, invokes it exactly
   }
}

Let's put it all together with a formatter example.

class IndyTrickster {
   @IndyTricks.AtLinkTime
   static String format(
        @IndyTricks.LinkTimeArg String format,
        @IndyTricks.RuntimeArg(polymorphic=true) Object... args) {
      return (String) formatLinker(genericMethodType(args.length).String.class, 
Object.class), x).invoke(x);
   }

   @IndyTricks.AtLinkTime(linkerFor="format",
        itsPolymorphicSoJustInvokeTheResult=true)
   static String formatLinker(
        @IndyTricks.MethodTypeArg MethodType mtype,
        @IndyTricks.LinkTimeArg String format) {
      return buildCustomFormatMH(format, mtype.parameterList());
   }
}

class IndyTrickUser {
   String numberedLine(int lineno, String line) {
      return format("%4d %s", lineno, line); // link-time optimized MH does 
format
   }
   String bad(int lineno, String line) {
      return format("%4d %s", line, lineno); // ERROR at link!
   }
}

(The annotation argument which says "just invoke
the result" is actually superfluous.  That is not
because the link-time helper returns a method
handle, since maybe the user is expecting a
method handle as a plain constant.  But the presence
of polymorphic arguments in the original method
that was replaced forces a method handle invocation.
If we can agree that this move is unavoidable,
we can omit the silly-looking annotation argument.)

OK, enough examples.

Are the annotations designed right?  Probably not.
Right now the Lumper in me is stronger than the
Splitter, so I'd say we could do nicely with one
annotation to mark methods as participating
in link-time processing (whether they are
helpers or not, and even if they are run-time
helpers), and one parameter annotation which
includes options for all the above degrees of
freedom.  Maybe @LinkConstant and
@LinkParameter.  (Visions of bikeshed
shantytowns…)

The net result is that we can treat tasks like
regular expression compilation and format string
verification as link-time operations, with the possibility
of optimizing the exact use case.  This works well
for some existing uses of invokedynamic, for
lambda creation and (more recently) string
concatenation.  Formatting can be viewed as
the next step beyond string concatenation.

As we use invokedynamic more and more,
we push more and more useful work into the
link phase.  This makes both source code and
bytecode very compact, but forces the JIT to
optimize only after link processing has performed
its magic.  This conflicts with ahead-of-time
compilation, which needs to see the results
of link processing before it generates code,
but doesn't have a running JVM to run the
linkage logic.  We will need to do further work
to create rules and frameworks for isolated
execution of link-time logic apart from the
actual application, before it runs.  Marking
some of our methods as @AtLinkTime is
a first step in that direction.

One more observation:  We haven't used any
call sites here.  Where are the call sites?  Users
need mutable call sites to build inline caches.
Of course, a mutable call site is just another kind
of constant.  (Well, it's a mutable constant, which
is odd.)  All the observations about creating constants
at link time apply to inline caches.  We want an
error signaled if the reweaver can't fold an inline cache,
so the annotation should say "nonConstant=ERROR".

In the above reweaver tricks, the handle returned for
a call site can just be the CallSite.dynamicInvoker,
and you are done.  It would be even more graceful
for the reweaver to accept a CallSite (or subtype
thereof) wherever a MethodHandle is sought,
and just do the right thing with it.

I guess that's enough to keep us busy for a while.
Anybody want to take a whack at it?

— John

_______________________________________________
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

binding to indy with an annotation-driven reweaver

Reply via email to