On Mar 22, 2022, at 10:52 PM, Dan Smith 
<daniel.sm...@oracle.com<mailto:daniel.sm...@oracle.com>> wrote:

On Mar 22, 2022, at 7:21 PM, Dan Heidinga 
<heidi...@redhat.com<mailto:heidi...@redhat.com>> wrote:

A couple of comments on the encoding and questions related to descriptors.


JVM proposal:

- Same conceptual framework.

- Classes can be ACC_VALUE, ACC_IDENTITY, or neither.

- Legacy-version classes are implicitly ACC_IDENTITY. Legacy interfaces are 
not. Optionally, modern-version concrete classes are also implicitly 
ACC_IDENTITY.

Maybe this is too clever, but if we added ACC_VALUE and ACC_NEITHER
bits, then any class without one of the bits set (including all the
legacy classes) are identity classes.


(Trying out this alternative approach to abstract classes: there's no more 
ACC_PERMITS_VALUE; instead, legacy-version abstract classes are automatically 
ACC_IDENTITY, and modern-version abstract classes permit value subclasses 
unless they opt out with ACC_IDENTITY. It's the bytecode generator's 
responsibility to set these flags appropriately. Conceptually cleaner, maybe 
too risky...)

With the "clever" encoding, every class is implicitly identity unless
it sets ACC_VALUE or ACC_NEITHER and bytecode generators have to
explicitly flag modern abstract classes.  This is kind of growing on
me.

A problem is that interfaces are ACC_NEITHER by default, not ACC_IDENTITY. 
Abstract classes and interfaces have to get two different behaviors based on 
the same 0 bits.

Here's another more stable encoding, though, that feels less fiddly to me than 
what I originally wrote:

ACC_VALUE means "allows value object instances"

ACC_IDENTITY means "allows identity object instances"

If you set *both*, you're a "neither" class/interface. (That is, you allow both 
kinds of instances.)

If you set *none*, you get the default/legacy behavior implicitly: classes are 
ACC_IDENTITY only, interfaces are ACC_IDENTITY & ACC_VALUE.

Update on encoding: after some internal discussion, I've found this to be the 
most natural fit:

- ACC_VALUE (0x0040) corresponds to the 'value' keyword in source files
- ACC_IDENTITY (0x0020) corresponds to the (often implicit) 'identity' keyword 
in source files
- If neither is set, the class/interface supports both kinds of subclasses (and 
must be abstract)
- If both are set, or any supers' flags conflict, it's an error
- In older-version classes (not interfaces), ACC_IDENTITY is assumed to be set

What about newer-version classes that use old encodings? (E.g., a tool bumps 
its output version number but isn't aware of these flags.) There's a sneaky 
trick here that minimizes the risk: ACC_IDENTITY is re-using the old ACC_SUPER, 
which no longer has any effect and that we've encouraged to be set since Java 
1.0.2. So if you're already setting ACC_SUPER in your classes, you've 
automatically opted in to ACC_IDENTITY; doing something different requires 
making changes to the generated code.

So the remaining incompatibility risk is that someone generates a class (not an 
interface) with a newer version number and with neither flag set (violating the 
"always set ACC_SUPER" advice), and then either the class won't load (it's 
concrete, it declares an instance field, etc.), or it's abstract and 
accidentally supports value subclasses, and so can be instantiated without 
running <init> logic. The number of unlikely events in this scenario seem like 
enough for us not to be concerned.

Reply via email to