Re: and factories

John Rose Thu, 17 Oct 2019 13:03:22 -0700

On Oct 17, 2019, at 11:22 AM, Dan Smith <[email protected]> wrote:
> 
> The plan of record for compiling the constructors of inline classes is to 
> generate static methods named "<init>" with an appropriate return type, and 
> invoke them with 'invokestatic'.
> 
> This requires relaxing the existing restrictions on method names and 
> references. Historically, the special names "<init>" and "<clinit>" have been 
> reserved for special-purpose JVM rules (for example, 'invokespecial' is 
> treated like a distinct instruction if it invokes a method named '<init>'); 
> for convenience, we've also prohibited all other method names that include 
> the characters '<' or '>' (JVMS 4.2.2).
> 
> Equivalently, we might say that, within the space of method names, we've 
> carved out a reserved space for special purposes: any names that include '<' 
> or '>'.
> 
> A few months ago, I put together a tentative specification that effectively 
> cedes a chunk of the reserved space for general usage [1]. The names "<init>" 
> and "<clinit>" are no longer reserved, *unless* they're paired with 
> descriptors of a certain form ("(.*)V" and "()V", respectively). Pulling on 
> the thread, we could even wonder whether the JVM should have a reserved space 
> at all—why can't I name my method "bob>" or "<janet>", for example?
> 
> In retrospect, I'm not sure this direction is such a good idea. There is 
> value in having well-known names that instantly indicate important 
> properties, without having more complex tests. (Complex tests are likely to 
> be a source of bugs and security exploits.) Since the JVM ecosystem is 
> already accustomed to the existence of a reserved space for special method 
> names, we can keep that space for free, while it's potentially costly to give 
> it up.
> 
> So here's a alternative design:
> 
> - "<init>" continues to indicate instance initialization methods; "<clinit>" 
> continues to indicate class initialization methods
> 
> - A new reserved name, "<new>", say, can be used to declare factories
> 
> - To avoid misleading declarations, methods named "<new>" must be static and 
> have a return type that matches their declaring class; only 'invokestatic' 
> instructions can reference them
> 
> - The rest of the "<.*>" space of names (plus ".*<.*" and ".*>.*") is held in 
> reserve, available for special purposes as we discover them
> 
> The Java compiler would only use "<new>" methods for inline class 
> construction, for now; perhaps in the future we'll find other use cases that 
> make sense (like surfacing some sort of factory mechanism).
> 
> Does this seem promising? Any particular reason it's better to overload 
> "<init>" than just come up with a new special name?


For my part either outcome is fine.  The prototype overloads <init> but it 
could almost as well have added <new>.

Fine points in the VM prototype:

- A method <init> must be static, and it can be restricted to return exactly 
the type of its declaring class, except in “cases”.
- In some cases (VMACs and hidden classes) the declaring class is not denotable 
in a descriptor; the return type must be a super (maybe always Object).

So the prototype allows Object as a return type from a static <init> function.  
I don’t remember whether it checks that the declaring class is a VMAC in that 
case.

Would there be any restrictions on the contents of a constructor/factory method 
<new>?  (I hope not.)

Would there be any enhancements to the capabilities of a <new> function?

For example, I think we should consider allowing <new> to invokespecial 
super.<init> on a new instance, and/or putstatic into the final fields of the 
new instance.
If don’t allow this, then translation strategies may have to spin private 
<init> methods to handle the super call and final field inits, which seems 
suboptimal to me.
(To be clear:  I’m thinking of using <new> here in a non-inline class.)

One result of using a different name (<new>) is that there’s no need to require 
that it be static or not.
I don’t think there’s any benefit to requiring that <new> be static.  (Well 
maybe some:  It partitions <new> from
any kind of virtual call.)  Maybe a non-static <new> could serve as a factory 
method which takes the current
instance and “reconstructs” it as a new instance.  But that can be done by 
wrapping a static <new> into some
other method m, and then there’s no confusion about making m virtual.

> [1] 
> http://cr.openjdk.java.net/~dlsmith/lw2/lw2-20190628/specs/init-methods-jvms.html

Using something like <new> is a forced move for inline classes.  It is also 
(IMO) a fruitful move for
regular non-inline (“identity”) classes.  If the translation strategy were 
adjusted to translate every
new Foo() expression as invokestatic <new>, the following benefits would appear:

- Less reliance on the verifier to validate arbitrary-in-the-wild 
“new/dup/invokespecial” code shapes.  (It’s been buggy in the past.)
- Simpler more optimizable bytecode for complex expressions like new A(…new 
B()…), currently a pain point in our JITs.
- A more direct path for migrating “new VT()” expressions from VT as a 
value-based class to an inline class.  (No migration with 
new/dup/invokespecial.)
- More compact (and analyzable) classfiles, when they contain new A(…) 
expressions.
- A future option to make the “new instance” instruction be *private* to the 
class which it is constructing, a probable security benefit.
- A future option to separate, at the language level, the capability of 
constructing a subclass instance (super()) from requesting a new object (new 
A()).

— John

P.S.  About that last option:  A public constructor C allows *both* creation of 
new instances and subclassing.  It is difficult
to separately control access for these operations.  (They correspond to calls 
to C.super.<init> and to C.<new>.)
If it were possible tease apart these as separate API points (corresponding to 
the distinct underlying names) then
they could be given independent access control (one public, one private, etc.).

In fact, a more clear separation would be to call the super-version 
C.super.<super>.  So that super() calls could
be translated to invokespecial <super> (with the same powers and 
responsibilities as for <init> in that position).
And new T() calls would be translated to invokestatic <new>.  And <init> would 
serve both at once, in various
use cases, but a class translation might have only <super> and <new>, or 
perhaps <super> and <new> and
some private <init> methods to factor out code used by both, locally.

I’ll tip my hand here:  I think of a <new> method as a “final constructor”:  
It’s the use of a constructor in the
terminal position, when the requested class is known, and *not* when a random 
subclass is requesting
initialization of one of its progeny.  I also think of <super> (or <init> used 
only by subclasses) as an
“abstract constructor”.  It’s the use of a constructor in a non-terminal 
position, when the requested class
is some subclass elsewhere but it needs to call up the super chain for proper 
instantiation.  The analogy
with final and abstract methods is not exact, but it is close enough that I 
think there’s something there.
In this mindset, I think of today’s <init> as a hack which performs both jobs, 
even though they are distinct,
and of today’s constructor notation as defining *both* the <new> and the 
<super> methods, and indeed
stashing the one copy of the code on the <init> hack.

When we get bridging technology, we can declaratively spin bridges from 
non-private <new> and
<super> API points (w/o bodies) into private <init> methods.  So the extra 
distinctions I’m thinking
of don’t have to end up duplicating bytecodes, in the common case where a class 
needs to define
parallel <new> and <super> API points.

Re: and factories

Reply via email to