Re: Draft JVMS changes for Nestmates

David Holmes Thu, 20 Apr 2017 01:22:45 -0700

Hi Dan,

I'll try not to double up on things already discussed elsewhere. Andavoid bikeshedding :)


On 20/04/2017 5:21 AM, Dan Smith wrote:

Thanks a lot for the feedback! My comments below:

On Apr 19, 2017, at 12:40 AM, David Holmes <david.hol...@oracle.com> wrote:

JEP name

I don't know how permanent JEP names are supposed to be, but I'd prefer a different name at this 
point. Something like: "Expanded JVM Access to Private Members"—shorter, focused on the 
feature itself rather than its relationship to the Java language. Or maybe "Class Nests for 
Access to Private Members".


The intent of the name is to aid in the contextual positioning of this JEP ie 
address why do we even need this. The JEP addresses an inconsistency between 
access controls in the Java programming language and access controls in the 
JVM. The primary goal here is to align the two, while at the same time allowing 
for additional usage of nestmates in the future.


Yeah, but I think explaining the motivation in the title makes for a long title 
and limits the imagination to one particular use case.

The JEP uses "nest top" to describe the class that nest members reference; I prefer "host class", 
which better describes the class's role and isn't tied to the Java "top level class" concept. I know we use 
"host class" internally in Hotspot, perhaps when working with anonymous classes (of the JVM flavor), but I 
think in that context it will ultimately mean the same thing? Are we comfortable repurposing the term in this way?


Given that a nest-top type must be a top-level type, I think nest-top fits 
perfectly.


This is a good example of my point about the title: the JVM has no concept* of "top 
level type", and doesn't need one. We just need a specific class to act as the 
reference point by which a group of classes can talk about a nest.

But we are attempting to justify this by relating to the impedancemismatch with the Java language. And we do utilize the notion of"top-level class" (as defined by JLS) elsewhere.

For example, I can imagine some language introducing a class 
file—foo.bar.Nest$$001—solely for the purpose of providing a hook for this 
reference point. javac's choice to designate a top-level class as the 
nest-top/nest-host is a compilation strategy.

(*Yes, there's an InnerClasses attribute, and Signature attributes, and lots of 
other stuff to facilitate compilation in the Java language. But those represent 
auxiliary metadata, not core JVM concepts.)

I dislike "host class" because that is already used for the quite different case of 
VMACs. I can live with other names but I think they should incorporate "nest" eg. 
nest-holder, nest-host. But I think nest-top is an ideal match. (And no I didn't invent this 
terminology :) ).


Okay. Probably best to sit on this for awhile and circle back to it, see how 
we're all feeling.


I think nest-host is starting to take the lead :)

I follow Brian's model when it comes to nest membership (5.4.4): every class 
belongs to a nest, possibly (in the absence of MemberOfNest) the nest hosted by 
itself. Many nests are singletons without any associated explicit attributes.


I have not yet seen a follow up from John as to whether we require or just 
allow, an empty NestMembers attribute to indicate a singleton nest member.


If in our model *every* class has a nest, then we definitely don't want to 
require every class to have a NestMembers attribute.

In my spec (5.4.4), the presence of a NestMembers attribute is irrelevant. A 
class without a MemberOfNest attribute belongs to its own nest, and there's no 
need to validate anything about NestMembers.

Verification of MemberOfNest

I include a discussion block about different options of validating 
MemberOfNest. I think the consensus, and my preference, is to do it during 
verification of the member class. (NestMembers, on the other hand, is never 
validated, except as a side-effect of checking MemberOfNest.)


Given verification is optional the current approach is to validate during 
linking just prior to the verification step. Whether that can conceptually be 
considered part of verification I am unsure.


I think this is the source of a lot of anomalies—spec treats verification as an 
essential part of linking, while Hotspot treats it as an optional add-on.

You should be okay, for some definition of "just prior". These are the 
constraints, as I read them:
- "A class or interface is completely loaded before it is linked." (5.4)
- Loading includes structural checks (CFE), version checks (UCVE), and 
validation and loading of superclasses and superinterfaces (5.3.5)
- "Errors detected during linkage are thrown at a point in the program where some 
action is taken by the program that might, directly or indirectly, require linkage to the 
class or interface involved in the error." (5.4)

That last one is pretty mysterious to me (what does "some action" mean? what about 
"indirectly"?), but seems to grant pretty wide latitude on the timing.

I believe the intent with that is to preserve the notion of preciseexceptions that the language has. Exceptions should be seen to be throwndue to a specific action of the program that required a class to beloaded/linked/initialized and which subsequently threw an exception.

Anyway, yes I think I'm okay with validation during linking. That saidthe reason I placed it before verification was because verificationcould require the nestmate-access check if it encountered aprivate-invokespecial. But as per discussion elsewhere (and perhapsbelow) your spec changes do not require the nestmate check, so theproblem I was addressing no longer exists.

Verification of invokespecial

Allowing invokespecial to refer to classes other than the current class and its 
supers is a significant change. I noticed and relied heavily on the parallel 
with invokevirtual making protected method calls. So I tried to make the two as 
similar as possible. In a few places, the treatment of protected methods 
doesn't seem ideal, and rather than trying to mirror that with non-private 
invokespecial, I modified the protected method treatment.


This concerns me because protected-access is a somewhat complex/messy issue so 
aligning private-access with it seems to go in the wrong direction. Cleaning up 
the protected access rules, while perhaps desirable, is technically 
out-of-scope for this JEP in my opinion.

The "protected check" of verification, in particular, was a mess before, and I 
think I've made it a lot more manageable, and compatible with a parallel rule for 
invokespecial. I could use some feedback on exactly what Hotspot's verifier has been 
doing here, though, since I'm pretty sure it didn't match the old specified rules.


Might be better to take this up separately.


Yeah, I hear you. But I worry about the technical debt if we just look the 
other way and specify/implement a fresh invokespecial check separately. The 
subtle inconsistencies between the two will lead to all sorts of bugs and 
tweaks in the future; the duplicate code/spec will be an ongoing maintenance 
burden.

In terms of implementation the verification of invokespecial involves anumber of checks, including the subtype check. What I have been doing issimply ignoring the result of the subtype check if I am dealing withnestmates. With the proposed spec for private invokespecials I cansimply skip the subtype test altogether. It is unclear what theramifications are of the changes to protected access on theimplementation. The mapping from rules to code is not obvious to me.

From the spec perspective, pulling on the thread looks like this:
- The new verification we need on invokespecial looks almost the same as the 
verification we need on protected methods.
- Okay, I'll re-use that rule. But it's a complicated mess, and has bugs, so 
I've got to clean it up first.
- What exactly are these rules trying to achieve? I can see the hard 
constraints (in configuration X, an error occurs), but there's a soft 
requirement to minimize class loading.
- To figure out when class loading is acceptable, I need to know what the 
reference implementation actually does.


I just don't like the idea of messing with the protected access checks.

MethodHandle resolution

The spec (5.4.3.5) is vague about what errors can occur during MethodHandle 
resolution. I assume any linkage error specified for the instruction can also 
occur via MethodHandle resolution, and that will include failures due to 
invokespecial improperly referencing a class.


This is an area I have least understanding of, but this seems to cover it:

"In each step, any exception that can be thrown as a result of failure of resolution 
of a class or interface or field or method reference can be thrown as a result of failure 
of method handle resolution."


The ambiguity is in the definition of "resolution". The preferred 
interpretation, which everyone seems to confirm reflects reality, is that it includes all 
linkage errors specified for the referencing instruction. I.e., errors from both both 
5.4.3 and 6.5 can occur.

Yes. 5.4.3 states that "Certain of the instructions above requireadditional linking checks when resolving symbolic references." And thatthose linking exceptions are listed with the instructions themselves. Soyes resolution errors consist of the union thereof.

Dynamic checking of invokespecial receivers

When invokespecial involves interface types, the verifier can't guarantee that 
receiver objects actually implement that interface (JDK-8134358). It's an open 
question whether this is really a problem, but I included a tentative fix in 
the invokespecial runtime rules.


This seems out of scope in the general case, but there may be nestmate specific 
actions required.


I touch it here because:
- It's a dynamic companion to verification, and we've modified the verification 
rules
- There's an existing rule that may or may not be intended to perform part of 
this dynamic check; that needs to be addressed somehow
- Any rule we introduce is impacted by the new treatment of private methods, so 
easier to get it right while working on this feature
- Easier to address small things like this when it's swapped in already

If necessary, we can yank it and address the bug separately in the future, but 
it seems convenient to tackle it now.

Compiler changes

The JEP text can't seem to decide if compiler changes are part of it, or a 
follow-up exercise. I think we're best off explicitly including the compiler 
changes, which will provide opportunities for design validation and testing.


Not sure what gives you that impression, but javac changes are an essential 
part of this. Lacking a compiler spec I assume this will be handled informally. 
But the attribute definitions (see comments below) should dictate what a source 
compiler is required to do.


- Would be clearest if "Goals" mentioned modifying javac
- Language like "javac can" suggests optional followup work; should have a paragraph 
saying "javac will" instead
- Remove the paragraph saying "it is not strictly required to remove these 
immediately as part of this feature"


Agreed - I think I already gave similar feedback but it didn't get taken up.

Security risk

The JEP text should acknowledge that, while this does allow compilers to grant 
finer-grained access to members shared by nestmates, it also pushes compilers 
to grant broader access to members that were previously kept private. It's a 
trade-off, and presumably a good one because nestmates are completely trusted, 
while package-mates might sometimes be suspect.


I'm not following you here. What broader access is being granted to members 
that were previously kept private? This effort allows private-only access to 
members that were previously package-accessible.


Old approach is to widen access to private members to the package level on a 
per-member basis.

New approach is to widen access to private members to the nest level on a 
per-class basis.

Nest is narrower than package (good!), but per-class is more permissive than 
per-member (bad). Some members that are compiled as totally private in JDK 9 will be 
incidentally shared with all nest members in <future JDK version>.

I'm not sure how you are coming at this. For a private member in 9 to beincidentally shared in <future-JDK> the class of the member has to beadded to a nest. With no context as to how this may happen I can'treally comment further. From the Java language perspective if the classis in a nest then such access to private members already exists.

4.7 Attributes

Regarding the Exceptions attribute, it seems to be used only to store data to 
be passed back to the libverify and the reflection API.


I'm not familiar with libverify, but I think you're confirming that the 
attribute is on par with Signature? Something for tools and standard APIs to 
rely on, but irrelevant to execution?


Right - sorry that wasn't clear.

Regarding the definitions of NestMembers and MemberOfNest ... I modelled the 
definitions in:

https://bugs.openjdk.java.net/browse/JDK-8177020

based on those for the innerclass (and related) attributes. I'm not sure why 
you moved away from those definitions as they pinned down exactly when these 
attributes are expected to appear. Without those parts the current definitions 
seem optional - ie they define what the attribute means if present, but they do 
not require its presence. I do not believe these attributes should be optional 
for classfile version 54, but required whenever a nest does exist. A java 
source compiler writer should look at these definitions and know when they must 
generate these attributes.


These sorts of assertions belong in the "Java Compiler Specification". (Yes, 
this document does not exist. Yes, that's kind of a problem, but we muddle through.)

Part of that muddling through, to me, is the ability for a compilerwriter to understand when they must generate certain classfile contents.Hence my approach here.

In an ideal world, JVMS would not acknowledge that the Java Programming 
Language exists. It should stand on its own, and Java is merely a client.

As is, we have some history that has led to some dependencies on Java in spec, 
but these are generally auxiliary items: attributes that don't impact JVM 
behavior (4.7), a tutorial on how to compile Java code (3), occasional 
explanatory references to Java language concepts (e.g., 2.9).

But it would be wrong for the specification of a "critical" attribute (4.7) to 
be defined in terms of Java language concepts. It should stand on its own, general enough 
to meet the needs of Java language compilers and others with different use cases.

I think you have somewhat rose-coloured glasses on if you think the JVMSis only tainted by the Java language in "auxiliary items". :) The wholeset of rules around invokespecial pertain specifically to how itsupports the Java language requirements for constructor invocations andsuper-type invocations. Otherwise invokespecial would have remained asits original invokenonvirtual - which makes more sense to me even now.

4.7.26 The MemberOfNest Attribute

"the class is implicitly a member of its own nest"

I suggest changing "a" to "the sole". Otherwise "a" implies there may be other 
members. As these are static properties being defined I don't think we need to be accounting for some future 
dynamic expansion of a nest.


I intended to include both the cases of a singleton nest and the case of a 
class that has a NestMembers attribute. But I'll add a clarifying sentence.

4.9.2 Structural Constraints

The changes to invokespecial seem okay, but I have a problem with terminology. 
In:

"If invokespecial is used to invoke a non-private, non-<init> method, referenced via 
a superclass or a direct superinterface, ..."

what does "referenced via" mean?


I'll change to "where the referenced class is"

Ok.

In:

"If getfield or putfield is used to access a protected field referenced in a 
superclass ..."

and following, you changed "declared" to read "referenced". Again I do not understand 
what this is supposed to mean. It is the declaration site of the field or method that is needed to determine 
whether that member is in the same or another package. To me "declared in a superclass ..." was 
exactly correct. Your note does not make the change any clearer to me.


Good catch. It's important that the rule be limited to cases that reference a 
superclass*; but it's also important that the declaring class be used for the 
package check.

(*Here's why: if I reference my subclass, and that resolves to a field declared 
in my superclass, the verifier will perform no check, because it may not know 
that the referenced class is a subclass.)

Revised:

"If `getfield` or `putfield` is used to access a `protected` field by reference to a 
superclass, and that field is declared in a different run-time package than the current 
class or interface, then the type of the class instance being accessed must be assignable 
to the current class or interface."

I still think "declared in" is the correct terminology. There's a nuanceto how you are expressing this that I'm just not getting.

4.10.1 Verification by Type Checking

Does the order in the rules mandate the order in which the VM must actually 
check things? classHasValidNest seems much more complex than the actual runtime 
process of validation (though it may just be exposing distinct steps that are 
lumped together at runtime).


Probably not, but dependencies between inputs/outputs do suggest an ordering.

(We could probably get into an argument with a logician or a Prolog expert about whether 
there's such a thing as an "output" at all, and whether it would be acceptable 
for a JVM to load every class in the universe in order to test certain predicates, which 
is why it's probably not a great idea to specify the verifier using Prolog rules. Alas.)

The important elements are:
1) Resolve the host class using the current class's loader
2) Check that the host class belongs to the same package
3) Check that the current class is named in the host's NestMembers
4) Resolve the current class's name in the host class's loader, and make sure 
it produces the same class


---

4.10.1.8 Type Checking for Restricted Member References

I'm not very good at reading or understanding these rules, but I'm surprised 
that the invokespecial rules seem to make no mention of nestmates at all:

"A method reference is allowed by an invokespecial instruction if it is not a 
restricted invokespecial reference, ..."

This seems to open things up to all private method references??


There's a careful partitioning of error conditions that has to happen in order 
to avoid too much class loading.

4.10.1.8 is only concerned with ensuring, in narrow cases, that the type on the stack is 
assignable to the current class. For invokespecial, the narrow case is a method reference 
naming a superclass/direct superinterface and a method not named "<init>", 
where the resolved method is not private.

Separately, 5.4.4 ensures that resolving a private method will produce an 
IllegalAccessError if the method is not in the same nest.

Also separately, 6.5 ensures that if the resolved method is not private, the 
referenced class is the current class, a superclass, or a direct superinterface.

And also separately, 4.10.1.9 ensures that the type on the stack is always 
assignable to the referenced class.

Okay. I have touched on this elsewhere but worth reiterating I think. Ihad been working on the expectation that an attempt to useprivate-invokespecial when the particpants were not nestmates would failverification. I added an explicit nestmate check to the verifier logic.But instead you're loosening the verifier to simply accept anyprivate-invokespecial. Then the actual invocation is access-checked.

That's okay - it simplifies things somewhat and avoids the potentialneed to load a foreign nest-host during verification. I'm just a littlesurprised. :)

5.4.4 Access Control

I don't think you need to restate:

"A class with a MemberOfNest attribute belongs to the nest hosted by the 
referenced host class.

A class without a MemberOfNest attribute implicitly belongs to a nest hosted by the 
class itself. (If the class also lacks a NestMembers attribute, then the nest has 
only one member.)"

The rule is simply stated as is: "belonging to the same nest as D" - belonging 
to the same nest is, or should be, already defined elsewhere.


My intent is that this *is* the definition. But sounds like you're expecting 
that definition to be in 4.7.25/4.7.26, which is probably more intuitive. I'll 
change that.

I'm avoiding commenting on the protected access changes but do want to raise a 
concern that there are no changes to Method resolution (5.4.3.3) yet resolution 
relies on the definition of access control to prune/discard inaccessible 
methods. It appears now that we're allowing more potential successful 
resolutions, then using the additional rules on the invoke* bytecodes to try 
and discard any undesirable results.


Yeah, this motivated my question about MethodHandle resolution: it seems that when we 
talk about "resolution", we really mean the process defined in 5.4.3, followed 
by any linkage checks defined in 6.5. If so, this is just a presentational reshuffling 
(with the minor exceptions I noted in the document).

Why bother? Because if the check on the referenced class of a protected 
invokevirtual/getfield/putfield is considered an access check (5.4.3.3), then the check 
on the referenced class of a non-private invokespecial probably ought to be considered an 
access check, too, but that leads to a concept of "accessible" that is really 
unwieldy and context-dependent.

6.5 getfield (and others)

You note "These rules are redundant: verification already guarantees them " but 
this brings me back to an area I keep raising concerns about: the split between static 
verification based checks and dynamic runtime checks. Yes the verifier precludes certain 
things but if we run without verification the rules expressed for the bytecodes are 
considered to be still required at runtime. If you delete them because they overlap with 
verifier rules we have no way to tell which rules must be enforced regardless of 
verification status.


I propose removing them because on their face they appear completely redundant, 
an artifact of a time when we didn't have clear definitions of loading, 
linking, verification, etc. But if we decide it really is helpful to keep some 
or all of these assertions, that's fine.

It seems to me that attempting to interpret unverified bytecodes is an 
implementation-specific feature that JVMS knows nothing about. Deciding which 
rules must be enforced in the absence of verification is an important part of 
designing that feature, and in an ideal world some comprehensive documentation 
about which verification assertions you've chosen to enforce dynamically would 
be maintained somewhere, but not in JVMS. If we want JVMS to maintain that 
list, then running without verification should be a first-class fully-specified 
feature.

The status quo, I'm guessing, is that a few dynamic checks are sprinkled 
throughout JVMS, but plenty of other checks are performed without any 
supporting spec text. Which isn't a great place to be in.

Okay this remains a problem area for me as I have flagged elsewhere. Ihad assumed the constraints in 6.x defined runtime constraints that hadto be enforced regardless of whether or not verification had occurred.

6.5 invokespecial

You note regarding the IllegalAccessError:

"This replaces a VerifyError previously specified by 4.9.2. The check must be 
delayed until after resolution in order to determine whether the referenced method is 
private."

We know the access flags for the referenced method at verification time - 
shouldn't that be the sole basis for the verifier check? Afterall it is 
verifying the static properties of the classfile and bytecode. If the actual 
resolved method has different access to the referenced method then that may 
lead to ICCE (depending on the exact nature of the change - a private method 
made public may not be an issue for example).


Example:

invokespecial SomeOtherClass.foo(I)V

Verification relies on the descriptor to tell it that this method expects an 
int and returns nothing. Those kinds of checks can be performed locally.

The descriptor doesn't tell you if the method is private. Only way to find out is to 
resolve SomeOtherClass, then resolve "foo(I)V".

If we wanted to do the check at verification time, we'd have to load every 
class named by an invokespecial instruction.

Now I am confused. I thought the access modifier of the method describedby the Method_Ref was available to us. If it is not then how can allthese verification rules be expressed in terms of non-private when we donot have the means at verification time to determine if the access isprivate or not ???

(I think you might be conflating declaration-site metadata, which we have easy 
access to, with use-site metadata, which requires resolving references.)

5.4.3.3 Method resolution

You did not make any changes here, but as per my comment in the bug report we do require, IMHO, 
further tightening here to ensure nestmate invokespecial invocations do not resolve to method 
implementations that they should not. The example here is a hierarchy of nestmates (C extends B 
extends A) where A and C both declare a private method "void m()" and we have an 
invocation c.m() where c is an instance of class C. We then "separately compile" C to 
remove the definition of m(). At runtime method resolution will locate A.m, even though the method 
reference was for C.m. Normally the accessibility rules would exclude A.m from being a viable 
candidate but here the classes are nestmates so A.m is accessible. But it would be wrong to invoke 
A.m. This case should throw ICCE or NSME.


You're taking issue with the fact that method resolution will match private 
methods in superclasses, when our intuition (and the Java language model) is 
that private methods are not inherited.

I agree: https://bugs.openjdk.java.net/browse/JDK-8021581


Interesting - yes this is a similar problem.

Is this the right venue to address that issue? As demonstrated above, I'm happy 
to lump in tangentially-related bug fixes. :-)

You could argue that this situation isn't so different from the status quo, but 
it does seem unique that you can delete 1 private method from a 
consistently-compiled program and then get a silent behavioral change.

Yes. The code was written to expect C.m to execute not A.m - the codedoesn't even know what A.m might do!

In practice, javac won't separately compile nestmates, so this would be hard to 
reproduce. More generally, the feature is not designed for separate 
compilation, so real-world scenarios may be hard to come by, even in other 
languages.

Right - it is handcrafted example in the current context. But I have noidea how a non-Java language may utilise nestmates or allow separatecompilation to occur.

I addressed this in the prototype by ensuring that the declared class ofthe resolved method equals the class of the method from the methoddescriptor.


Thanks,
David

Anyway, I'm game to take another look at JDK-8021581 if we want.

—Dan

Re: Draft JVMS changes for Nestmates

Reply via email to