> On Jan 31, 2018, at 2:50 PM, Brian Goetz <[email protected]> wrote:
>
> OK, so the goal is clearly to *prohibit* cyclic constants by enforcing a
> depth constraint on condy-in-condy nesting. I like this goal much better,
> and I see how the solution addresses it.
>
> Still, I have to wonder whether its worth the complexity of capturing it
> explicitly with a new (stackmap-like) attribute, which must be specified, and
> gives us more chances for the classfile to be out of sync with itself, for
> the goal of preventing something that can't usefully happen anyway -- as we
> discussed previously, trying to actually resolve a cyclic constant will
> likely result in SOE, so such classfiles are pretty much useless and
> therefore not likely to spread very far.
This is a reasonable critique. Another critique is that the cost of tracking
down and parsing an extra attribute will undo a lot of the performance benefit
of having it, perhaps even causing a net slowdown.
As we've discussed the idea a little more, my inclination (raised in the
meeting this week) is to give up on trying to add structural restrictions on
constant pools designed to facilitate cycle detection. It seems like we can, in
fact, detect cycles without extra guidance, and with minimal overhead in most
cases. Some details below.
Big picture, the most attractive path forward seems to be:
- In 11, a CONSTANT_Dynamic with a cycle is an "error", reported via SOE at
resolution time
- When we introduce lazy condy resolution, we move the error detection to class
load time, with a check that we're comfortable isn't disrupting startup time
- As other potentially-cyclical constants are introduced, we use the same load
time checks
- We never require extra information from the compiler, or reject classes that
are hard to prove cycle-free
One nice thing about that last point is that we don't have to worry about
incompatible support for "hard" classes in different releases.
I'd love to hear about some experiments with this. We're talking about
performance impact, and all discussion to this point has been hypothetical.
Would be good to get some concrete feedback.
—Dan
-----
Details:
We started with a rule for format checking (JVMS 4.8) that says it's illegal
for a CONSTANT_Dynamic (or, in the future, other potentially-cyclical
constants) to refer, via static arguments, to itself—directly or indirectly.
But we quickly decided this sounded expensive, and we didn't want a heavy
validation step to mess with startup time.
Hence, we explored various ways to prohibit otherwise-valid classes that would
be hard to prove cycle-free. Something like the RecursiveConstants attributes
seems to be the best option in that space.
However, scrutinizing the original assumption, it seems reasonable that we can
optimize the check so that it's extremely lightweight when references are
"backward" (to previous entries in the constant pool), and only performs more
expensive analysis when references are "forward". Most classes should naturally
use backward references exclusively, since entries tend to be generated
bottom-up. (In theory. If there are counter examples produced by any mainstream
bytecode generators, that would be good to know.)
So a dumb (worst case O(n^2)) but fast (very low constant cost) check could
look something like:
void checkCP(int i) {
switch (tag(i)) {
case CONSTANT_Dynamic:
...
for (int j : condyStaticArgs(i)) {
if (j >= i && tag(j) == CONSTANT_Dynamic) {
assertUnreachable(i, j); // can't reach i from j
}
}
...
...
}
}
It's already necessary to check that constant pool references are well-typed,
so everything up to 'assertUnreachable' here is practically free*. The
'assertUnreachable' call might be somewhat expensive, but it will rarely be
called in practice (and even then, we don't expect many structures to be
particularly deep).
(*In the particular case of CONSTANT_Dynamic, there's an indirection through
BootstrapMethods. If a class heavily shares BootstrapMethods entries, then
there's a new cost here, iterating over the same static arguments multiple
times. Some optimization could be done to speed that up. For example, if there
are no forward references on the first pass, there will be no forward
references on later passes, when i is a larger number.)
A good worst-case test would be a constant pool without cycles, built entirely
out of forward references, consisting of lots of deep CONSTANT_Dynamic trees.
Needless to say, we don't expect to encounter classes like this in the wild.
But, if we do, the big benefit here is that, while they'll be a little slower
to load, they'll still work.