Re: Collapsing the requirements

John Rose Tue, 06 Aug 2019 14:03:48 -0700

Good discussion!

On Aug 6, 2019, at 9:50 AM, Brian Goetz <[email protected]> wrote:
> 
>> So, legal signatures will be:
>>  - QV;
>>  - LI;
>> and that’s it, right?
>> 
>> Q will continue to have its current semantic (flattenable, non-nullable, 
>> triggers pre/eager-loading).
>> L will continue to have its legacy semantic (indirection, nullable, no new 
>> loading rules)
> 
> Correct.  Nice and simple!


Not completely simple.  The old contract of LV; will haunt us slightly.  
Remember that LG; is a valid descriptor, for any garbage name G even if G 
doesn’t exist.  (E.g., “Lno/such/package/or/type!!;”.)

You can’t find all such LG;.  Therefore, LV; must be allowed as a possibility, 
on the same footing as LG;.

Note that reflecting over LG; will get a CNFE.  And the verifier will make only 
limited accommodation for such types, in effect allowing only “null” into such 
variables.

There’s nothing to be gained by trying to make the rules against LV; more 
strict than those for LG;.  Therefore, the interpretation of LV; should be “as 
if” the string V in that descriptor were truly a non-existing type, to be 
diagnosed at all the same times that any other LG; would be checked and 
diagnosed.

> 
>> 
>>> 
>>> Note that the VM can optimize eclairs about as well as it could for LV; it 
>>> knows that I is the adjunction of null to V, so that all non-null values of 
>>> I are identity free and must be of type V.
>> 
>> Optimizing I might require some knowledge about V, but because V <: I, I 
>> could be loaded while V is not loaded yet.
> 
> If the rule is “always preload Q” (which I think is what John is suggesting), 
> then this case cannot come up, because I’s class file will mention QV.  
> Similarly, the opposite case does not happen either, as we load super types 
> first, so loading V will trigger loading I.  

Yup.  The only truly lazy scenario would be when some API uses only the LI; 
type, as a descriptor not a  CONSTANT_Class.  Then the normal contract for 
L-descriptors applies:  I.class isn’t loaded until there’s some specific need 
for I (as in a CONSTANT_MethodType).

That is pleasingly similar to the situation with today’s primitives and their 
wrappers:  “I” is hardwired but “java/lang/Integer” is not hardwired to the 
same degree (the verifier doesn’t have to load it always, for example).

> Of course, we can twiddle these rules and get different answers, but this is 
> my understanding based on the rules I have heard for load order.
> 
>> 
>>> 
>>> What we lose relative to V? is access to fields; it was possible to do 
>>> `getfield` on a LV, but not on I.  If this is important (and maybe it’s 
>>> not), we can handle this in other ways.
>> 
>> This is related to an open question that shows up in many places in this 
>> document.
>> What should be the nature of V’s super type? An interface or an abstract 
>> class?
>> If it is an abstract class, it could declare and access the fields.
>> The question expands further than just fields, what’s about methods’ bodies?

Yes, these are interesting questions.  One thing that makes me happier about 
this model is the fact that several of the possible answers require no new JVM 
functionality, but are simply translation strategy decisions.

At the moment, I personally prefer the idea (out of several possible ideas) of 
keeping all concrete functionality inside the inline class V, and lift only API 
surface into I as (i) abstract methods, (ii) supers, and (iii) type variables, 
and further to do this lifting “the old fashioned way” by requiring javac to do 
the copying at compile time.  This is good enough to kick off experimentation 
with the resulting user model, IMO, if not in LW10 then in LW5 (if we need 
margin for adjustment).

Indeed, after that many questions follow, about fields, static methods, the 
role (if any) of non-interface supers such as ValObject (if not an interface), 
the possible role of covariance (or not) on V<:I within the V/I APIs, nested 
classes of V, type inference rules for V and I, support for user customization 
of I, alternative patterns other than I=V.Box, JVM or JLS support for defining 
various bits of the pattern, and so on.  (I’m sure I missed something!)  But 
simply copying the (public!) methods into an otherwise-empty I.class (as 
abstracts, plus supers & typevars) seems a great first cut to me.

>> Should they be in V or in I? This has an impact on the type of ’this’ in 
>> these
>> methods, even if this model has the nice property that ’this’ will always 
>> point
>> to an instance of V (as long as the JVM protects the model, and prevents 
>> external
>> forces (JVMTI, Unsafe, etc.) from breaking the special and unique 
>> relationship
>> between I and V). And the type of ’this’ will also impact the way methods are
>> invoked (invokevirtual vs invokeinterface).
> 
> There’s a longer discussion to be had about bringing abstract classes and 
> interfaces closer together, or allowing abstract class super types of values, 
> and if so, how.  I have some vague ideas of how the VM and language could 
> handle this combination; rather than dive into that now, I’ll just say that 
> here are the places where the concept of inline-extends-abstract-class has 
> come up:
> 
>  - Migrating VBC to inline classes
>  - Inline records (as there is an abstract Record super type)
>  - Whether ValObject is an interface or an abstract class
> 
> Which is to say, we should untangle this knot, which I think is pretty 
> closely related to the RefObject/ValObject knot, so I would think it is best 
> to untangle them together.

+1  I think there are several ways forward on this front, and we can pick a 
good one.

> 
>> 
>>> 
>>> #### With sugar on top, please
>>> 
>>> We can provide syntax sugar (please, let’s not bike shed it now) so that an 
>>> inline clause _automatically_ acquires a corresponding interface (if one is 
>>> not explicitly provided), onto which the public members (and type 
>>> variables, and other super types) of C are lifted.  
>> 
>> Does the interface only declares public methods, or does it also provide the 
>> implementation (default method)?
> 
> If we extract an interface from the class mechanically, we would lift the 
> public methods, the super types, and the type variables to the interface.  If 
> the user writes the interface by hand, they will do what they’re going to do.

+1; a good first cut and maybe even the last cut.

> 
>> 
>>> For sake of exposition, let’s say this is called `C.Box` — and is a 
>>> legitimate inner class of C (which can be generated by the compiler as an 
>>> ordinary classfile.)  
>> 
>> Is it a new feature? Or just an idea how it could be implemented in the 
>> future?
>> 
>> Because I’ve tried to compile this:
>> 
>> public class C implements C.Box {
>>    static public interface Box {
>> 
>>    }
>> }
> 
> Yes, we would have to address this.  The cycle here is not a real cycle, in 
> that Box does not depend on C for anything, except it happens to live there.  

As the author of that particular restriction I would support lifting it, at 
least in the case of interfaces, and probably also of any “static” nested 
class.  The proposed inheritance would be ill-founded if the outer were to 
extend a non-static inner, which is why it’s a restriction in the first place, 
but I widened it to a simpler rule out of an abundance of caution.  Time to 
change it.

> 
>>> #### Boxing conversion
>>> 
>>> Given the constraints of the eclair relationship, it would be reasonable 
>>> for the compiler to derive from this that there is a boxing conversion 
>>> between C and I (I is just the value set of C, plus null — which is the 
>>> relationship boxes have with their corresponding primitives.)  The boxing 
>>> operation is a no-op (since C <: I) and the unboxing operation is a null 
>>> checking cast.
>> 
>> Could we assume that boxing/unboxing would be handled by the static compiler 
>> (like primitive boxing today),
>> and there’s no expectation that the JVM will do magic boxing when needed? 
>> (Not considering auto-bridges yet).
> 
> Yes.  In fact, we only need this in one direction; since C <: I, the 
> conversion C -> I comes for free (scbtyping), it is only the conversion I -> 
> C that would require an unboxing conversion.  The compiler would introduce 
> the necessary casts (which the VM can optimize to null checks.)  

It’s less than the full boxing/unboxing pattern, since “boxing” is subsumed by 
simple widening to a super.  Also, “unboxing” is just a cast (narrowing to a 
sub).  We might need a new term to express this hybrid between full-on 
“unboxing” and a plain casting conversion, so the JLS can say “unboxing and 
devoxing” (or whatever) wherever today’s unboxing comes into play.

> 
>>> 
>>> The world is indeed full of existing utterances of `LOptional`, and they 
>>> will still want to work.  Fortunately, Optional follows the rules for being 
>>> a value-based class.  We start with migrating Optional from a reference 
>>> class to an eclair with a public abstract class and a private value 
>>> implementation.  Now, existing code just works (source and binary) — and 
>>> optionals are values.  But, this isn’t good enough; existing variables of 
>>> type Optional are not flattened.
>> 
>> Notable difference with previous statements: here the eclair is made of an 
>> inline class and an abstract class
>> (instead of an inline class and an interface). I assume this is for backward 
>> compatibility (Optional’s methods
>> are currently invoked using invokevirtual and not invokeinterface).
> 
> Correct.  There are multiple ways to handle this.  One is to allow eclairs 
> with abstract classes; another is to blur the distinction between abstract 
> class and interface so that we can make Optional an interface and support the 
> invoke virtual callsites in the wild.  I think I prefer the former, but once 
> we start to untangle the ValObject/RefObject knot, I suspect we’ll know more.

My long-term wish list of JVM cleanups already includes deprecating 
invokeinterface and upgrading invokevirtual to cover its job.  This is a fine 
time to think about doing that.

> 
>> Having V’s super type be an abstract class, some additional issues have to 
>> be considered.
>> If both V and V’s super are classes (abstract or not), they both can declare 
>> fields, so they
>> could end up having different layouts. Even if javac checks against that, 
>> manually crafted
>> class files and instrumentation frameworks injecting fields (with 
>> redefineClass) could create
>> situations where a mismatch exists between V and V’s super.
>> Would this cause issues? Should the JVM guard against that? To be 
>> investigated.

Indeed.  My thought here is that fields inherited into an inline type would be 
completely taken over by the inline type; the layout of the abstract super 
would *not* be reused, so there *would* be mismatches between V and its super.  
We’d have to distinguish carefully between uses of fields inside an inline 
instance (which are always “full custom”) and fields inside a classic 
“identity” (indirect) instance, which are always set inside the super and 
inherited as a full layout.  Unsafe field offsets would be subject to 
restrictions:  You can always use them on the declaring class if it’s concrete, 
but if it a field is inherited into an inline you somehow have to determine the 
field offset relative to the particular inline class.  These restrictions apply 
to numeric offsets.  For symbolic references the problem is probably not so 
bad.  We can probably mandate that a symbolic reference to an inherited field 
must mention the inline type using the field, not the abstract declaring it; a 
similar effect is already obtained by the rules of protected fields.  Maybe we 
get some useful leverage from mandating that all fields inherited into an 
inline are protected?  Just brainstorming here… As Brian says, there are 
details to work out.

I’d be happy to exclude fields for now and have abstract superclasses define 
only behavior and statics, not instance state.  Or, if the problem is confined 
to just ValObject and methods of the Object protocol, I’d be OK with making 
ValObject be an interface, *but* a special one that can hold methods for the 
Object protocol (which is forbidden for most interfaces), and maybe also final 
methods (which is also forbidden for interfaces).  Like I say, we have multiple 
options.

> 
> It would have to be worked out.  I think John said something like “let the 
> language guard against inline value classes extending inappropriate abstract 
> classes, and if the VM sees inline class extend an abstract class, ignore the 
> fields and the ctors.  This is probably a reasonable 
> first-order-approximation if we decide to go this route.

Yeah; if there are no fields then I think we can make a structural rule that 
the abstract’s constructor is just as free of behavior as that of an interface. 
 The JVM can verify that it is a bare call to Object.<init>()V or whatever is 
next up the chain, and javac would forbid constructors to be coded.

If there are fields then there are complications to work out…  But I’ll stop 
brainstorming/ratholing now, since there’s more important stuff to consider.

> 
>> 
>>> There are a few ways to get there.  One is to treat this problem as 
>>> protecting such classes from uninitialized fields or array elements; 
>>> another is to ensure that such classes (a) have no public fields and (b) 
>>> perform the correct check at the top of each method (which can be injected 
>>> by the compiler.)  I don’t want to solve that problem right here, but I 
>>> think there enough ways to get there that we can assume this isn’t a hard 
>>> requirement.
>> 
>> Would (b) be applied to non-static inner inline classes, or are they 
>> definitively considered as a lost cause?
>> Currently they can throw a NPE which is not so bad after all.
> 
> Depends on how early we can guarantee that NPE.  If the class might do a 
> bunch of side effects before hitting the dereference of the outer pointer, 
> then we might leave things in an inconsistent state.  If we can fail faster, 
> that is good.  This area definitely needs investigation.

Suppose C.IV is an inner inline class of C, in which both C.this and IV.this 
are in scope.  It might be OK if IV.default is NPE-happy *if* we can make it 
harder to observe.  There are lots of things we could try to do this, but 
perhaps the simplest thing to do is take Remi’s remedy, to confine such types 
to a non-public role.  A companion interface could be placed into the public 
API of C, as a replacement, and C would be free to pass around either null or 
valid instances of C.IV (but not IV.default, which C would avoid).  Since IV 
would be non-public, "nobody but family” would be making arrays or fields of 
type IV.  Maybe that’s enough; I think it’s worth an experiment.  Maybe more 
specific tactics would work also, such as having the JLS forbid uninitialized 
fields and array construction of type C.IV *outside of C’s nest*.  The JVM 
would allow such things, and they’s have NPE risks, but non-family would be 
firmly discouraged by the language from declaring variables that initialize to 
IV.default.  An explicit mention of IV.default is probably also to be 
discouraged; if C wants to export a constant that exposes this NPE-risky value, 
that’s the business of C’s author, but the language can forbid it outside of 
C’s nest.  Lots to talk about here, but we have (as I say) multiple options 
that seem OK.

FTR, I think Remi’s remedy (of confining inlines to non-public) is a little too 
restrictive for inlines in general, although maybe it’s a conservative thing to 
try in LW5, when we are running user model experiments.  If we want to start 
using inlines as “new numerics” (B-float, etc.) it’s really unfriendly to 
require users to encounter them only via their companion interfaces.

> 
>> The model looks promising, but a more precise specification of eclairs would 
>> be helpful
>> to estimate the impact on the JVM:
>> 
>>  - What is the nature of V’s super?

Low impact option for JVM:  Just an interface for starters, to be adjusted for 
ValObject later as needed.

>>  - How fields/methods are declared/implemented between V and V’s super? 

Low impact:  javac takes responsibility for copying stuff up from V to V’s 
super.  JVM just reads the class file.

>>  - Is there any special requirements regarding static members between V and 
>> V’s super?

Low impact:  JVM just does what it’s told according to whatever is in V.class 
and I.class, using standard rules.

>>  - Is there a requirement that V and V’s super share the bodies of their 
>> non-static public methods?

Low impact:  javac plants abstract methods inside I.class and the JVM does what 
it’s told.  (Javac is handling the Mirandas this time.)

> Good questions, I hope to have answers eventually.  If you have preferred 
> answers, please share your thinking

(Done for my part, see above!)

— John

Re: Collapsing the requirements

Reply via email to