----- Original Message ----- > From: "John Rose" <[email protected]> > To: "Brian Goetz" <[email protected]> > Cc: "Remi Forax" <[email protected]>, "valhalla-spec-experts" > <[email protected]> > Sent: Thursday, July 13, 2023 10:52:38 PM > Subject: Re: The last miles
> On 13 Jul 2023, at 7:24, Brian Goetz wrote: > >> This is a good thought; we split the initialization protocol and its a fair >> question to ask whether we can go back to a lump. >> >> In this case, I suspect John is about to say “Please let’s not give the >> verifier >> any more jobs to do.” > > It is that, and even worse. If you work the details, you’ll quickly run into > the fact that the <init> protocol (for Java constructors) builds an object but > does not return the new object, it takes the new object from the caller in a > tabula rasa (blank) state, and pokes values into it. Worse, the new object is > supplied (by a new opcode) from an untrusted (even hostile) client. That > means > that the verifier needs complex rules (>10% of the total complexity) to track > these untrusted-but-trusted blank objects and make sure they are handed to > <init> before being used. That’s bad. We have a steady bug stream from this > very delicate machinery. Maybe it’s done after a quarter century but I > wouldn’t bet the farm on that. > > Worse still, for values, there is no architecturally defined state, for > values, > which corresponds to the “tabula rasa” state of the receiver of an <init> > call. > We know something of that state; it is called a “larval object”, but the > Valhalla JVMS does not define or rely on it. The proposed “unification” would > require us to somehow simulate larval objects in terms of today’s blank > identity objects, and define how the larval-to-adult state transition works, > or > it would have to build new verifier rules for larval objects (mutable while > <init> runs, then pure values after that). Either option seems much worse > than > what we have chosen to do so far. > > What we have chosen to do so far is have a functionally clean model for value > objects that does not require mutability, either temporary (larval-only) or > permanent (I shudder at that thought). This functionally clean model uses > withfield instead of getfield, and aconst_init instead of the “new” opcode. I > think that is a great trade, because it lets us off the hook from defining > mutability into values, at any stage of their lifetimes. > > Yes, serialization smuggles larval mutability back in, but that’s a private > matter of optimization, between the VM and JDK. I really don’t want to see > that in the JVMS, because it would be just as hairy and complex and bug-prone > as today’s new/<init> dance. Yes, we should use the old mechanisms when we > can, and we do! But the new/<init> dance is, IMO, hopelessly entangled with a > presupposition of object identity, and also hopelessly buggy; so I don’t think > it can help us, and I wouldn’t touch to extend it even if I thought it might > help. > > How’s that? :-) Here, your analysis is based on the fact that neither the callsite nor the declaration site of <init> will change. We are less contrained than that, the callsite can not be changed but the declaration of <init> can change, recompiling the value class is something users will have to do anyway. So the new + dup + invokespecial <init> dance has to be the same but not the semantics of each individual opcodes which can be adjusted to value class (it's a lump move) and the content of <init> and even its decriptor can be different. Here is what I propose, - inside the value class, <init> should return the instance, so the decriptor should be <init>(LComplex;)LComplex; instead of <init>()V for a constructor with no parameters. So inside the constructor, either "this" or the first parameter is ignored and withfield is used instead of putfield, the fully initialized instance is returned by <init>. The verifier is updated to understand the opcode "withfield". - outside the value class, the semantics of the opcode "new" is changed to be the semantics of "aconst_init" if the class is a value class. invokespecial Complex <init>()V semantics is changed if Complex is a value class, so on stack takes two instances + the parameters and calls <init>(LComplex;)LComplex; It's not beautiful, it's a hack, as Brian said it's a lump move. But it's not as bad as you seem to think :) Rémi
