Hi Dan,
I'll try not to double up on things already discussed elsewhere. And
avoid bikeshedding :)
On 20/04/2017 5:21 AM, Dan Smith wrote:
Thanks a lot for the feedback! My comments below:
On Apr 19, 2017, at 12:40 AM, David Holmes <david.hol...@oracle.com> wrote:
JEP name
I don't know how permanent JEP names are supposed to be, but I'd prefer a different name at this
point. Something like: "Expanded JVM Access to Private Members"—shorter, focused on the
feature itself rather than its relationship to the Java language. Or maybe "Class Nests for
Access to Private Members".
The intent of the name is to aid in the contextual positioning of this JEP ie
address why do we even need this. The JEP addresses an inconsistency between
access controls in the Java programming language and access controls in the
JVM. The primary goal here is to align the two, while at the same time allowing
for additional usage of nestmates in the future.
Yeah, but I think explaining the motivation in the title makes for a long title
and limits the imagination to one particular use case.
The JEP uses "nest top" to describe the class that nest members reference; I prefer "host class",
which better describes the class's role and isn't tied to the Java "top level class" concept. I know we use
"host class" internally in Hotspot, perhaps when working with anonymous classes (of the JVM flavor), but I
think in that context it will ultimately mean the same thing? Are we comfortable repurposing the term in this way?
Given that a nest-top type must be a top-level type, I think nest-top fits
perfectly.
This is a good example of my point about the title: the JVM has no concept* of "top
level type", and doesn't need one. We just need a specific class to act as the
reference point by which a group of classes can talk about a nest.
But we are attempting to justify this by relating to the impedance
mismatch with the Java language. And we do utilize the notion of
"top-level class" (as defined by JLS) elsewhere.
For example, I can imagine some language introducing a class
file—foo.bar.Nest$$001—solely for the purpose of providing a hook for this
reference point. javac's choice to designate a top-level class as the
nest-top/nest-host is a compilation strategy.
(*Yes, there's an InnerClasses attribute, and Signature attributes, and lots of
other stuff to facilitate compilation in the Java language. But those represent
auxiliary metadata, not core JVM concepts.)
I dislike "host class" because that is already used for the quite different case of
VMACs. I can live with other names but I think they should incorporate "nest" eg.
nest-holder, nest-host. But I think nest-top is an ideal match. (And no I didn't invent this
terminology :) ).
Okay. Probably best to sit on this for awhile and circle back to it, see how
we're all feeling.
I think nest-host is starting to take the lead :)
I follow Brian's model when it comes to nest membership (5.4.4): every class
belongs to a nest, possibly (in the absence of MemberOfNest) the nest hosted by
itself. Many nests are singletons without any associated explicit attributes.
I have not yet seen a follow up from John as to whether we require or just
allow, an empty NestMembers attribute to indicate a singleton nest member.
If in our model *every* class has a nest, then we definitely don't want to
require every class to have a NestMembers attribute.
In my spec (5.4.4), the presence of a NestMembers attribute is irrelevant. A
class without a MemberOfNest attribute belongs to its own nest, and there's no
need to validate anything about NestMembers.
Verification of MemberOfNest
I include a discussion block about different options of validating
MemberOfNest. I think the consensus, and my preference, is to do it during
verification of the member class. (NestMembers, on the other hand, is never
validated, except as a side-effect of checking MemberOfNest.)
Given verification is optional the current approach is to validate during
linking just prior to the verification step. Whether that can conceptually be
considered part of verification I am unsure.
I think this is the source of a lot of anomalies—spec treats verification as an
essential part of linking, while Hotspot treats it as an optional add-on.
You should be okay, for some definition of "just prior". These are the
constraints, as I read them:
- "A class or interface is completely loaded before it is linked." (5.4)
- Loading includes structural checks (CFE), version checks (UCVE), and
validation and loading of superclasses and superinterfaces (5.3.5)
- "Errors detected during linkage are thrown at a point in the program where some
action is taken by the program that might, directly or indirectly, require linkage to the
class or interface involved in the error." (5.4)
That last one is pretty mysterious to me (what does "some action" mean? what about
"indirectly"?), but seems to grant pretty wide latitude on the timing.
I believe the intent with that is to preserve the notion of precise
exceptions that the language has. Exceptions should be seen to be thrown
due to a specific action of the program that required a class to be
loaded/linked/initialized and which subsequently threw an exception.
Anyway, yes I think I'm okay with validation during linking. That said
the reason I placed it before verification was because verification
could require the nestmate-access check if it encountered a
private-invokespecial. But as per discussion elsewhere (and perhaps
below) your spec changes do not require the nestmate check, so the
problem I was addressing no longer exists.
Verification of invokespecial
Allowing invokespecial to refer to classes other than the current class and its
supers is a significant change. I noticed and relied heavily on the parallel
with invokevirtual making protected method calls. So I tried to make the two as
similar as possible. In a few places, the treatment of protected methods
doesn't seem ideal, and rather than trying to mirror that with non-private
invokespecial, I modified the protected method treatment.
This concerns me because protected-access is a somewhat complex/messy issue so
aligning private-access with it seems to go in the wrong direction. Cleaning up
the protected access rules, while perhaps desirable, is technically
out-of-scope for this JEP in my opinion.
The "protected check" of verification, in particular, was a mess before, and I
think I've made it a lot more manageable, and compatible with a parallel rule for
invokespecial. I could use some feedback on exactly what Hotspot's verifier has been
doing here, though, since I'm pretty sure it didn't match the old specified rules.
Might be better to take this up separately.
Yeah, I hear you. But I worry about the technical debt if we just look the
other way and specify/implement a fresh invokespecial check separately. The
subtle inconsistencies between the two will lead to all sorts of bugs and
tweaks in the future; the duplicate code/spec will be an ongoing maintenance
burden.
In terms of implementation the verification of invokespecial involves a
number of checks, including the subtype check. What I have been doing is
simply ignoring the result of the subtype check if I am dealing with
nestmates. With the proposed spec for private invokespecials I can
simply skip the subtype test altogether. It is unclear what the
ramifications are of the changes to protected access on the
implementation. The mapping from rules to code is not obvious to me.
From the spec perspective, pulling on the thread looks like this:
- The new verification we need on invokespecial looks almost the same as the
verification we need on protected methods.
- Okay, I'll re-use that rule. But it's a complicated mess, and has bugs, so
I've got to clean it up first.
- What exactly are these rules trying to achieve? I can see the hard
constraints (in configuration X, an error occurs), but there's a soft
requirement to minimize class loading.
- To figure out when class loading is acceptable, I need to know what the
reference implementation actually does.
I just don't like the idea of messing with the protected access checks.
MethodHandle resolution
The spec (5.4.3.5) is vague about what errors can occur during MethodHandle
resolution. I assume any linkage error specified for the instruction can also
occur via MethodHandle resolution, and that will include failures due to
invokespecial improperly referencing a class.
This is an area I have least understanding of, but this seems to cover it:
"In each step, any exception that can be thrown as a result of failure of resolution
of a class or interface or field or method reference can be thrown as a result of failure
of method handle resolution."
The ambiguity is in the definition of "resolution". The preferred
interpretation, which everyone seems to confirm reflects reality, is that it includes all
linkage errors specified for the referencing instruction. I.e., errors from both both
5.4.3 and 6.5 can occur.
Yes. 5.4.3 states that "Certain of the instructions above require
additional linking checks when resolving symbolic references." And that
those linking exceptions are listed with the instructions themselves. So
yes resolution errors consist of the union thereof.
Dynamic checking of invokespecial receivers
When invokespecial involves interface types, the verifier can't guarantee that
receiver objects actually implement that interface (JDK-8134358). It's an open
question whether this is really a problem, but I included a tentative fix in
the invokespecial runtime rules.
This seems out of scope in the general case, but there may be nestmate specific
actions required.
I touch it here because:
- It's a dynamic companion to verification, and we've modified the verification
rules
- There's an existing rule that may or may not be intended to perform part of
this dynamic check; that needs to be addressed somehow
- Any rule we introduce is impacted by the new treatment of private methods, so
easier to get it right while working on this feature
- Easier to address small things like this when it's swapped in already
If necessary, we can yank it and address the bug separately in the future, but
it seems convenient to tackle it now.
Compiler changes
The JEP text can't seem to decide if compiler changes are part of it, or a
follow-up exercise. I think we're best off explicitly including the compiler
changes, which will provide opportunities for design validation and testing.
Not sure what gives you that impression, but javac changes are an essential
part of this. Lacking a compiler spec I assume this will be handled informally.
But the attribute definitions (see comments below) should dictate what a source
compiler is required to do.
- Would be clearest if "Goals" mentioned modifying javac
- Language like "javac can" suggests optional followup work; should have a paragraph
saying "javac will" instead
- Remove the paragraph saying "it is not strictly required to remove these
immediately as part of this feature"
Agreed - I think I already gave similar feedback but it didn't get taken up.
Security risk
The JEP text should acknowledge that, while this does allow compilers to grant
finer-grained access to members shared by nestmates, it also pushes compilers
to grant broader access to members that were previously kept private. It's a
trade-off, and presumably a good one because nestmates are completely trusted,
while package-mates might sometimes be suspect.
I'm not following you here. What broader access is being granted to members
that were previously kept private? This effort allows private-only access to
members that were previously package-accessible.
Old approach is to widen access to private members to the package level on a
per-member basis.
New approach is to widen access to private members to the nest level on a
per-class basis.
Nest is narrower than package (good!), but per-class is more permissive than
per-member (bad). Some members that are compiled as totally private in JDK 9 will be
incidentally shared with all nest members in <future JDK version>.
I'm not sure how you are coming at this. For a private member in 9 to be
incidentally shared in <future-JDK> the class of the member has to be
added to a nest. With no context as to how this may happen I can't
really comment further. From the Java language perspective if the class
is in a nest then such access to private members already exists.
4.7 Attributes
Regarding the Exceptions attribute, it seems to be used only to store data to
be passed back to the libverify and the reflection API.
I'm not familiar with libverify, but I think you're confirming that the
attribute is on par with Signature? Something for tools and standard APIs to
rely on, but irrelevant to execution?
Right - sorry that wasn't clear.
Regarding the definitions of NestMembers and MemberOfNest ... I modelled the
definitions in:
https://bugs.openjdk.java.net/browse/JDK-8177020
based on those for the innerclass (and related) attributes. I'm not sure why
you moved away from those definitions as they pinned down exactly when these
attributes are expected to appear. Without those parts the current definitions
seem optional - ie they define what the attribute means if present, but they do
not require its presence. I do not believe these attributes should be optional
for classfile version 54, but required whenever a nest does exist. A java
source compiler writer should look at these definitions and know when they must
generate these attributes.
These sorts of assertions belong in the "Java Compiler Specification". (Yes,
this document does not exist. Yes, that's kind of a problem, but we muddle through.)
Part of that muddling through, to me, is the ability for a compiler
writer to understand when they must generate certain classfile contents.
Hence my approach here.
In an ideal world, JVMS would not acknowledge that the Java Programming
Language exists. It should stand on its own, and Java is merely a client.
As is, we have some history that has led to some dependencies on Java in spec,
but these are generally auxiliary items: attributes that don't impact JVM
behavior (4.7), a tutorial on how to compile Java code (3), occasional
explanatory references to Java language concepts (e.g., 2.9).
But it would be wrong for the specification of a "critical" attribute (4.7) to
be defined in terms of Java language concepts. It should stand on its own, general enough
to meet the needs of Java language compilers and others with different use cases.
I think you have somewhat rose-coloured glasses on if you think the JVMS
is only tainted by the Java language in "auxiliary items". :) The whole
set of rules around invokespecial pertain specifically to how it
supports the Java language requirements for constructor invocations and
super-type invocations. Otherwise invokespecial would have remained as
its original invokenonvirtual - which makes more sense to me even now.
4.7.26 The MemberOfNest Attribute
"the class is implicitly a member of its own nest"
I suggest changing "a" to "the sole". Otherwise "a" implies there may be other
members. As these are static properties being defined I don't think we need to be accounting for some future
dynamic expansion of a nest.
I intended to include both the cases of a singleton nest and the case of a
class that has a NestMembers attribute. But I'll add a clarifying sentence.
4.9.2 Structural Constraints
The changes to invokespecial seem okay, but I have a problem with terminology.
In:
"If invokespecial is used to invoke a non-private, non-<init> method, referenced via
a superclass or a direct superinterface, ..."
what does "referenced via" mean?
I'll change to "where the referenced class is"
Ok.
In:
"If getfield or putfield is used to access a protected field referenced in a
superclass ..."
and following, you changed "declared" to read "referenced". Again I do not understand
what this is supposed to mean. It is the declaration site of the field or method that is needed to determine
whether that member is in the same or another package. To me "declared in a superclass ..." was
exactly correct. Your note does not make the change any clearer to me.
Good catch. It's important that the rule be limited to cases that reference a
superclass*; but it's also important that the declaring class be used for the
package check.
(*Here's why: if I reference my subclass, and that resolves to a field declared
in my superclass, the verifier will perform no check, because it may not know
that the referenced class is a subclass.)
Revised:
"If `getfield` or `putfield` is used to access a `protected` field by reference to a
superclass, and that field is declared in a different run-time package than the current
class or interface, then the type of the class instance being accessed must be assignable
to the current class or interface."
I still think "declared in" is the correct terminology. There's a nuance
to how you are expressing this that I'm just not getting.
4.10.1 Verification by Type Checking
Does the order in the rules mandate the order in which the VM must actually
check things? classHasValidNest seems much more complex than the actual runtime
process of validation (though it may just be exposing distinct steps that are
lumped together at runtime).
Probably not, but dependencies between inputs/outputs do suggest an ordering.
(We could probably get into an argument with a logician or a Prolog expert about whether
there's such a thing as an "output" at all, and whether it would be acceptable
for a JVM to load every class in the universe in order to test certain predicates, which
is why it's probably not a great idea to specify the verifier using Prolog rules. Alas.)
The important elements are:
1) Resolve the host class using the current class's loader
2) Check that the host class belongs to the same package
3) Check that the current class is named in the host's NestMembers
4) Resolve the current class's name in the host class's loader, and make sure
it produces the same class
---
4.10.1.8 Type Checking for Restricted Member References
I'm not very good at reading or understanding these rules, but I'm surprised
that the invokespecial rules seem to make no mention of nestmates at all:
"A method reference is allowed by an invokespecial instruction if it is not a
restricted invokespecial reference, ..."
This seems to open things up to all private method references??
There's a careful partitioning of error conditions that has to happen in order
to avoid too much class loading.
4.10.1.8 is only concerned with ensuring, in narrow cases, that the type on the stack is
assignable to the current class. For invokespecial, the narrow case is a method reference
naming a superclass/direct superinterface and a method not named "<init>",
where the resolved method is not private.
Separately, 5.4.4 ensures that resolving a private method will produce an
IllegalAccessError if the method is not in the same nest.
Also separately, 6.5 ensures that if the resolved method is not private, the
referenced class is the current class, a superclass, or a direct superinterface.
And also separately, 4.10.1.9 ensures that the type on the stack is always
assignable to the referenced class.
Okay. I have touched on this elsewhere but worth reiterating I think. I
had been working on the expectation that an attempt to use
private-invokespecial when the particpants were not nestmates would fail
verification. I added an explicit nestmate check to the verifier logic.
But instead you're loosening the verifier to simply accept any
private-invokespecial. Then the actual invocation is access-checked.
That's okay - it simplifies things somewhat and avoids the potential
need to load a foreign nest-host during verification. I'm just a little
surprised. :)
5.4.4 Access Control
I don't think you need to restate:
"A class with a MemberOfNest attribute belongs to the nest hosted by the
referenced host class.
A class without a MemberOfNest attribute implicitly belongs to a nest hosted by the
class itself. (If the class also lacks a NestMembers attribute, then the nest has
only one member.)"
The rule is simply stated as is: "belonging to the same nest as D" - belonging
to the same nest is, or should be, already defined elsewhere.
My intent is that this *is* the definition. But sounds like you're expecting
that definition to be in 4.7.25/4.7.26, which is probably more intuitive. I'll
change that.
I'm avoiding commenting on the protected access changes but do want to raise a
concern that there are no changes to Method resolution (5.4.3.3) yet resolution
relies on the definition of access control to prune/discard inaccessible
methods. It appears now that we're allowing more potential successful
resolutions, then using the additional rules on the invoke* bytecodes to try
and discard any undesirable results.
Yeah, this motivated my question about MethodHandle resolution: it seems that when we
talk about "resolution", we really mean the process defined in 5.4.3, followed
by any linkage checks defined in 6.5. If so, this is just a presentational reshuffling
(with the minor exceptions I noted in the document).
Why bother? Because if the check on the referenced class of a protected
invokevirtual/getfield/putfield is considered an access check (5.4.3.3), then the check
on the referenced class of a non-private invokespecial probably ought to be considered an
access check, too, but that leads to a concept of "accessible" that is really
unwieldy and context-dependent.
6.5 getfield (and others)
You note "These rules are redundant: verification already guarantees them " but
this brings me back to an area I keep raising concerns about: the split between static
verification based checks and dynamic runtime checks. Yes the verifier precludes certain
things but if we run without verification the rules expressed for the bytecodes are
considered to be still required at runtime. If you delete them because they overlap with
verifier rules we have no way to tell which rules must be enforced regardless of
verification status.
I propose removing them because on their face they appear completely redundant,
an artifact of a time when we didn't have clear definitions of loading,
linking, verification, etc. But if we decide it really is helpful to keep some
or all of these assertions, that's fine.
It seems to me that attempting to interpret unverified bytecodes is an
implementation-specific feature that JVMS knows nothing about. Deciding which
rules must be enforced in the absence of verification is an important part of
designing that feature, and in an ideal world some comprehensive documentation
about which verification assertions you've chosen to enforce dynamically would
be maintained somewhere, but not in JVMS. If we want JVMS to maintain that
list, then running without verification should be a first-class fully-specified
feature.
The status quo, I'm guessing, is that a few dynamic checks are sprinkled
throughout JVMS, but plenty of other checks are performed without any
supporting spec text. Which isn't a great place to be in.
Okay this remains a problem area for me as I have flagged elsewhere. I
had assumed the constraints in 6.x defined runtime constraints that had
to be enforced regardless of whether or not verification had occurred.
6.5 invokespecial
You note regarding the IllegalAccessError:
"This replaces a VerifyError previously specified by 4.9.2. The check must be
delayed until after resolution in order to determine whether the referenced method is
private."
We know the access flags for the referenced method at verification time -
shouldn't that be the sole basis for the verifier check? Afterall it is
verifying the static properties of the classfile and bytecode. If the actual
resolved method has different access to the referenced method then that may
lead to ICCE (depending on the exact nature of the change - a private method
made public may not be an issue for example).
Example:
invokespecial SomeOtherClass.foo(I)V
Verification relies on the descriptor to tell it that this method expects an
int and returns nothing. Those kinds of checks can be performed locally.
The descriptor doesn't tell you if the method is private. Only way to find out is to
resolve SomeOtherClass, then resolve "foo(I)V".
If we wanted to do the check at verification time, we'd have to load every
class named by an invokespecial instruction.
Now I am confused. I thought the access modifier of the method described
by the Method_Ref was available to us. If it is not then how can all
these verification rules be expressed in terms of non-private when we do
not have the means at verification time to determine if the access is
private or not ???
(I think you might be conflating declaration-site metadata, which we have easy
access to, with use-site metadata, which requires resolving references.)
5.4.3.3 Method resolution
You did not make any changes here, but as per my comment in the bug report we do require, IMHO,
further tightening here to ensure nestmate invokespecial invocations do not resolve to method
implementations that they should not. The example here is a hierarchy of nestmates (C extends B
extends A) where A and C both declare a private method "void m()" and we have an
invocation c.m() where c is an instance of class C. We then "separately compile" C to
remove the definition of m(). At runtime method resolution will locate A.m, even though the method
reference was for C.m. Normally the accessibility rules would exclude A.m from being a viable
candidate but here the classes are nestmates so A.m is accessible. But it would be wrong to invoke
A.m. This case should throw ICCE or NSME.
You're taking issue with the fact that method resolution will match private
methods in superclasses, when our intuition (and the Java language model) is
that private methods are not inherited.
I agree: https://bugs.openjdk.java.net/browse/JDK-8021581
Interesting - yes this is a similar problem.
Is this the right venue to address that issue? As demonstrated above, I'm happy
to lump in tangentially-related bug fixes. :-)
You could argue that this situation isn't so different from the status quo, but
it does seem unique that you can delete 1 private method from a
consistently-compiled program and then get a silent behavioral change.
Yes. The code was written to expect C.m to execute not A.m - the code
doesn't even know what A.m might do!
In practice, javac won't separately compile nestmates, so this would be hard to
reproduce. More generally, the feature is not designed for separate
compilation, so real-world scenarios may be hard to come by, even in other
languages.
Right - it is handcrafted example in the current context. But I have no
idea how a non-Java language may utilise nestmates or allow separate
compilation to occur.
I addressed this in the prototype by ensuring that the declared class of
the resolved method equals the class of the method from the method
descriptor.
Thanks,
David
Anyway, I'm game to take another look at JDK-8021581 if we want.
—Dan