On Oct 17, 2019, at 11:22 AM, Dan Smith <[email protected]> wrote: > > The plan of record for compiling the constructors of inline classes is to > generate static methods named "<init>" with an appropriate return type, and > invoke them with 'invokestatic'. > > This requires relaxing the existing restrictions on method names and > references. Historically, the special names "<init>" and "<clinit>" have been > reserved for special-purpose JVM rules (for example, 'invokespecial' is > treated like a distinct instruction if it invokes a method named '<init>'); > for convenience, we've also prohibited all other method names that include > the characters '<' or '>' (JVMS 4.2.2). > > Equivalently, we might say that, within the space of method names, we've > carved out a reserved space for special purposes: any names that include '<' > or '>'. > > A few months ago, I put together a tentative specification that effectively > cedes a chunk of the reserved space for general usage [1]. The names "<init>" > and "<clinit>" are no longer reserved, *unless* they're paired with > descriptors of a certain form ("(.*)V" and "()V", respectively). Pulling on > the thread, we could even wonder whether the JVM should have a reserved space > at all—why can't I name my method "bob>" or "<janet>", for example? > > In retrospect, I'm not sure this direction is such a good idea. There is > value in having well-known names that instantly indicate important > properties, without having more complex tests. (Complex tests are likely to > be a source of bugs and security exploits.) Since the JVM ecosystem is > already accustomed to the existence of a reserved space for special method > names, we can keep that space for free, while it's potentially costly to give > it up. > > So here's a alternative design: > > - "<init>" continues to indicate instance initialization methods; "<clinit>" > continues to indicate class initialization methods > > - A new reserved name, "<new>", say, can be used to declare factories > > - To avoid misleading declarations, methods named "<new>" must be static and > have a return type that matches their declaring class; only 'invokestatic' > instructions can reference them > > - The rest of the "<.*>" space of names (plus ".*<.*" and ".*>.*") is held in > reserve, available for special purposes as we discover them > > The Java compiler would only use "<new>" methods for inline class > construction, for now; perhaps in the future we'll find other use cases that > make sense (like surfacing some sort of factory mechanism). > > Does this seem promising? Any particular reason it's better to overload > "<init>" than just come up with a new special name?
For my part either outcome is fine. The prototype overloads <init> but it could almost as well have added <new>. Fine points in the VM prototype: - A method <init> must be static, and it can be restricted to return exactly the type of its declaring class, except in “cases”. - In some cases (VMACs and hidden classes) the declaring class is not denotable in a descriptor; the return type must be a super (maybe always Object). So the prototype allows Object as a return type from a static <init> function. I don’t remember whether it checks that the declaring class is a VMAC in that case. Would there be any restrictions on the contents of a constructor/factory method <new>? (I hope not.) Would there be any enhancements to the capabilities of a <new> function? For example, I think we should consider allowing <new> to invokespecial super.<init> on a new instance, and/or putstatic into the final fields of the new instance. If don’t allow this, then translation strategies may have to spin private <init> methods to handle the super call and final field inits, which seems suboptimal to me. (To be clear: I’m thinking of using <new> here in a non-inline class.) One result of using a different name (<new>) is that there’s no need to require that it be static or not. I don’t think there’s any benefit to requiring that <new> be static. (Well maybe some: It partitions <new> from any kind of virtual call.) Maybe a non-static <new> could serve as a factory method which takes the current instance and “reconstructs” it as a new instance. But that can be done by wrapping a static <new> into some other method m, and then there’s no confusion about making m virtual. > [1] > http://cr.openjdk.java.net/~dlsmith/lw2/lw2-20190628/specs/init-methods-jvms.html Using something like <new> is a forced move for inline classes. It is also (IMO) a fruitful move for regular non-inline (“identity”) classes. If the translation strategy were adjusted to translate every new Foo() expression as invokestatic <new>, the following benefits would appear: - Less reliance on the verifier to validate arbitrary-in-the-wild “new/dup/invokespecial” code shapes. (It’s been buggy in the past.) - Simpler more optimizable bytecode for complex expressions like new A(…new B()…), currently a pain point in our JITs. - A more direct path for migrating “new VT()” expressions from VT as a value-based class to an inline class. (No migration with new/dup/invokespecial.) - More compact (and analyzable) classfiles, when they contain new A(…) expressions. - A future option to make the “new instance” instruction be *private* to the class which it is constructing, a probable security benefit. - A future option to separate, at the language level, the capability of constructing a subclass instance (super()) from requesting a new object (new A()). — John P.S. About that last option: A public constructor C allows *both* creation of new instances and subclassing. It is difficult to separately control access for these operations. (They correspond to calls to C.super.<init> and to C.<new>.) If it were possible tease apart these as separate API points (corresponding to the distinct underlying names) then they could be given independent access control (one public, one private, etc.). In fact, a more clear separation would be to call the super-version C.super.<super>. So that super() calls could be translated to invokespecial <super> (with the same powers and responsibilities as for <init> in that position). And new T() calls would be translated to invokestatic <new>. And <init> would serve both at once, in various use cases, but a class translation might have only <super> and <new>, or perhaps <super> and <new> and some private <init> methods to factor out code used by both, locally. I’ll tip my hand here: I think of a <new> method as a “final constructor”: It’s the use of a constructor in the terminal position, when the requested class is known, and *not* when a random subclass is requesting initialization of one of its progeny. I also think of <super> (or <init> used only by subclasses) as an “abstract constructor”. It’s the use of a constructor in a non-terminal position, when the requested class is some subclass elsewhere but it needs to call up the super chain for proper instantiation. The analogy with final and abstract methods is not exact, but it is close enough that I think there’s something there. In this mindset, I think of today’s <init> as a hack which performs both jobs, even though they are distinct, and of today’s constructor notation as defining *both* the <new> and the <super> methods, and indeed stashing the one copy of the code on the <init> hack. When we get bridging technology, we can declaratively spin bridges from non-private <new> and <super> API points (w/o bodies) into private <init> methods. So the extra distinctions I’m thinking of don’t have to end up duplicating bytecodes, in the common case where a class needs to define parallel <new> and <super> API points.
