Re: Do we even need IO/VO interfaces? (was: JEP update: Value Objects)

forax Mon, 20 Dec 2021 16:01:10 -0800

> From: "Brian Goetz" <brian.go...@oracle.com>
> To: "Remi Forax" <fo...@univ-mlv.fr>
> Cc: "daniel smith" <daniel.sm...@oracle.com>, "Dan Heidinga"
> <heidi...@redhat.com>, "John Rose" <john.r.r...@oracle.com>,
> "valhalla-spec-experts" <valhalla-spec-experts@openjdk.java.net>
> Sent: Lundi 20 Décembre 2021 20:26:01
> Subject: Do we even need IO/VO interfaces? (was: JEP update: Value Objects)


> I thought we were wrapping this up; I'm not sure how we got back to "do we 
> even
> need these at all", but OK. Splitting off a separate (hopefully short) thread.

> These interfaces serve both a dynamic and static role. Statically, they allow 
> us
> to constrain inputs, such as:

> void runWithLock(IdentityObject lock, Runnable task)

> and similar use in generic type bounds.

> Dynamically, they allow code to check before doing something partial:

> if (x instanceof IdentityObject) { synchronized(x) { ... } }

> rather than trying and dealing with IMSE.
The static role is defeated by having a java.lang.Object, a super type for both 
IdentityObject and ValueObject. 
java.io.Serializable is useless as a type, ObjectOutputStream.writeObject() 
takes an Object not a Serializable, 
same for Arrays.sort() that takes an Object[] and not an array of Comparable, 
IdentityObject (like Serializable or Comparable) as a type can be easily lost 
because of the existence of Object. 

If the type IdentityObject can be lost, as a designer, there is little point to 
have a method that takes an IdentityObject as parameter, because it forces the 
user of the API to use a cast, trading a CCE for an IMSE. 

For the dynamic role, x.getClass().isValue() does the same thing in a more 
efficient way (apart if the VM has a special optimization for IdentityObject). 

Moreover, there are very few methods that synchronize on a user provided object 
because it makes the concurrent code hard to reason about it. 
Adding a bit in the type system to support codes that people should not write 
is not exactly a win. 

> Introducing new interfaces that have no methods is clearly source- and binary
> compatible, so I am not particularly compelled by "some very brittle and badly
> written code might break." So far, no one has proposed any examples that would
> make us reconsider that.
??; 
you are forgetting inference, this code will fail to compile 
class A {} 
class B {} 
var list = List.of(new A(), new B()); 
List<Object> list2 = list: 

> As to "value class" vs "primitive class" vs "built in primitive", I see no
> reason to add *additional* mechanisms by which to distinguish these in either
> the static or dynamic type systems; the salient difference is identity vs
> value. (Reflection will almost certainly give us means to ask questions about
> how the class was declared, though.)
Primitive (builtin or not) allows tearing, so we should introduce two 
interfaces TearableObject and NonTeareableObject, because knowing if something 
is tearable or not clearly changes the algorithm that can be used. 

> As to B3: instanceof operates on reference types, so (at least from a pure 
> spec
> / model perspective), `x instanceof T` gets answered on value instances by
> lifting to the reference type, and answering the question there. So it would
> not even be a sensible question to ask "are you a primitive value vs primitive
> reference"; subtyping is a "reference affordance", and questions about
> subtyping are answered in the reference domain.

> And to B4: the goal is to make B3 and B4 as similar as possible; there are 
> going
> to be obvious ways in which we can't do this, but this should not be relevant
> to either the static or dynamic type system.
I agree that B3 and B4 should be as similar as possible, we still need 
Class.isPrimitive() to only return true for builtin primitives to be backward 
compatible. 

Rémi 

> On 12/20/2021 2:05 PM, Remi Forax wrote:

>> Brian,
>> the last time we talked about IdentityObject and ValueObject, you said that 
>> you
>> were aware that introducing those interfaces will break some existing codes,
>> but you wanted to know if it was a lot of codes or not.

>> So i do not understand now why you want to mix IdentityObject/ValueObject 
>> with
>> the runtime behavior, it seems risky and if we need to backout the 
>> introduction
>> of those interfaces, it will more work than it should.
>> Decoupling the typing part and the runtime behavior seems a better solution.

>> Moreover, the split between IdentityObject and ValueObject makes less sense 
>> now
>> that we have 3 kinds of value objects, the identityless reference (B2), the
>> primitive (B3) and the builtin primitive (B4).
>> Why do we want these types to be seen in the type system but not by example 
>> the
>> set containing only B3 and B4 ?

>> Rémi

>>> From: "Brian Goetz" [ mailto:brian.go...@oracle.com | 
>>> <brian.go...@oracle.com> ]
>>> To: "daniel smith" [ mailto:daniel.sm...@oracle.com | 
>>> <daniel.sm...@oracle.com>
>>> ] , "Dan Heidinga" [ mailto:heidi...@redhat.com | <heidi...@redhat.com> ]
>>> Cc: "John Rose" [ mailto:john.r.r...@oracle.com | <john.r.r...@oracle.com> 
>>> ] ,
>>> "valhalla-spec-experts" [ mailto:valhalla-spec-experts@openjdk.java.net |
>>> <valhalla-spec-experts@openjdk.java.net> ]
>>> Sent: Lundi 20 Décembre 2021 18:54:01
>>> Subject: Re: JEP update: Value Objects

>>> I was working on some docs and am not sure if we came to a conclusion on the
>>> rules about who may, may not, or must declare ValueObject or IdentityObject.

>>> Let me see if I can chart the boundaries of the design space. I'll start 
>>> with
>>> IdentityObject since it is more constrained.

>>> - Clearly for legacy classes, the VM is going to have to infer and inject
>>> IdentityObject.
>>> - Since IdentityObject is an interface, it is inherited; if my super 
>>> implements
>>> IO, so am I.
>>> - It seems desirable that a user be *allowed* to name IdentityObject as a
>>> superinterface of an interface or abstract class, which constrains what
>>> subclasses can do. (Alternately we could spell this "value interface" or 
>>> "value
>>> abstract class"; this is a separate set of tradeoffs.)
>>> - There is value in having exactly one way to say certain things; it 
>>> reduces the
>>> space of what has to be specified and tested.
>>> - I believe our goal is to know everything we need to know at class load 
>>> time,
>>> and not to have to go back and do complex checks on a supertype when a 
>>> subclass
>>> is loaded.

>>> The choice space seems to be
>>> user { must, may, may not } specify IO on concrete classes
>>> x compiler { must, may, may not } specify IO when ACC_VALUE present
>>> x VM (and reflection) { mops up }

>>> where "mopping up" minimally includes dealing with legacy classfiles.

>>> Asking the user to say "IdentityObject" on each identity class seems 
>>> ridiculous,
>>> so we can drop that one.

>>> user { may, may not } specify IO on concrete classes
>>> x compiler { must, may, may not } specify IO when ACC_VALUE present
>>> x VM (and reflection) { mops up }

>>> From a user model perspective, it seems arbitrary to say the user may not
>>> explicitly say IO for concrete classes, but may so do for abstract classes. 
>>> So
>>> the two consistent user choices are either:

>>> - User can say "implements IO" anywhere they like
>>> - User cannot say "implements IO" anywhere, and instead we have an 
>>> "identity"
>>> modifier which is optional on concrete classes and acts as a constraint on
>>> abstract classes/interfaces.

>>> While having an "identity" modifier is nice from a completeness 
>>> perspective, the
>>> fact that it is probably erased to "implements IdentityObject" creates
>>> complication for reflection (and another asymmetry between reflection and
>>> javax.lang.model). So it seems that just letting users say "implements
>>> IdentityObject" is reasonable.

>>> Given that the user has a choice, there is little value in "compiler may not
>>> inject", so the choice for the compiler here is "must" vs "may" inject. 
>>> Which
>>> is really asking whether we want to draw the VM line at legacy vs new
>>> classfiles, or merely adding IO as a default when nothing else has been
>>> selected. Note that asking the compiler to inject based on ACC_VALUE is also
>>> asking pretty much everything that touches bytecode to do this too, and 
>>> likely
>>> to generate more errors from bytecode manglers. The VM is doing inference
>>> either way, what we get to choose here is the axis.

>>> Let's put a pin in IO and come back to VO.

>>> The user is already saying "value", and we're stuck with the default being
>>> "identity". Unless we want to have the user say "value interface" for a
>>> value-only interface (which moves some complexity into reflection, but is 
>>> also
>>> a consistent model), I think we're stuck with letting the user specify 
>>> either
>>> IO/VO on an abstract class / interface, which sort of drags us towards 
>>> letting
>>> the user say it (redundantly) on concrete classes too.

>>> The compiler and VM will always type-check the consistency of the value
>>> keyword/bit and the implements clause. So the real question is where the
>>> inference/injection happens. And the VM will have to do injection for at 
>>> least
>>> IO at least for legacy classes.

>>> So the choices for VM infer&inject seem to be:

>>> - Only inject IO for legacy concrete classes, based on classfile version,
>>> otherwise require everything to be explicit;
>>> - Inject IO for concrete classes when ACC_VALUE is not present, require VO 
>>> to be
>>> explicit;
>>> - Inject IO for concrete classes when ACC_VALUE is not present; inject VO 
>>> for
>>> concrete classes when ACC_VALUE is present

>>> Is infer&inject measurably more costly than just ordinary classfile 
>>> checking? It
>>> seems to me that if all things are equal, the simpler injection rule is
>>> preferable (the third), mostly on the basis of what it asks of humans who 
>>> write
>>> code to manipulate bytecode, but if there's a real cost to the injection, 
>>> then
>>> having the compiler help out is reasonable. (But in that case, it probably
>>> makes sense for the compiler to help out in all cases, not just VO.)

>>> On 12/2/2021 6:11 PM, Dan Smith wrote:

>>>>> On Dec 2, 2021, at 1:04 PM, Dan Heidinga [ mailto:heidi...@redhat.com |
>>>>> <heidi...@redhat.com> ] wrote:

>>>>> On Thu, Dec 2, 2021 at 10:05 AM Dan Smith [ 
>>>>> mailto:daniel.sm...@oracle.com |
>>>>> <daniel.sm...@oracle.com> ] wrote:

>>>>>> On Dec 2, 2021, at 7:08 AM, Dan Heidinga [ mailto:heidi...@redhat.com |
>>>>>> <heidi...@redhat.com> ] wrote:

>>>>>> When converting back from our internal form to a classfile for the
>>>>>> JVMTI RetransformClasses agents, I need to either filter the interface
>>>>>> out if we injected it or not if it was already there.  JVMTI's
>>>>>> GetImplementedInterfaces call has a similar issue with being
>>>>>> consistent - and that's really the same issue as reflection.

>>>>>> There's a lot of small places that can easily become inconsistent -
>>>>>> and therefore a lot of places that need to be checked - to hide
>>>>>> injected interfaces.  The easiest solution to that is to avoid
>>>>>> injecting interfaces in cases where javac can do it for us so the VM
>>>>>> has a consistent view.

>>>>>> I think you may be envisioning extra complexity that isn't needed here. 
>>>>>> The plan
>>>>>> of record is that we *won't* hide injected interfaces.

>>>>> +1.  I'm 100% on board with this approach.  It cleans up a lot of the
>>>>> potential corner cases.

>>>>>> Our hope is that the implicit/explicit distinction is meaningless—that 
>>>>>> turning
>>>>>> implicit into explicit via JVMTI would be a 100% equivalent change. I 
>>>>>> don't
>>>>>> know JVMTI well, so I'm not sure if there's some reason to think that 
>>>>>> wouldn't
>>>>>> be acceptable...

>>>>> JVMTI's "GetImplementedInterfaces" spec will need some adaptation as
>>>>> it currently states "Return the direct super-interfaces of this class.
>>>>> For a class, this function returns the interfaces declared in its
>>>>> implements clause."

>>>>> The ClassFileLoadHook (CFLH) runs either with the original bytecodes
>>>>> as passed to the VM (the first time) or with "morally equivalent"
>>>>> bytecodes recreated by the VM from its internal classfile formats.
>>>>> The first time through the process the agent may see a value class
>>>>> that doesn't have the VO interface directly listed while after a call
>>>>> to {retransform,redefine}Classes, the VO interface may be directly
>>>>> listed.  The same issues apply to the IO interface with legacy
>>>>> classfiles so with some minor spec updates, we can paper over that.

>>>>> Those are the only two places: GetImplementedInterfaces & CFLH and
>>>>> related redefine/retransform functions, I can find in the JVMTI spec
>>>>> that would be affected.  Some minor spec updates should be able to
>>>>> address both to ensure an inconsistency in the observed behaviour is
>>>>> treated as valid.

>>>> Useful details, thanks.

>>>> Would it be a problem if the ClassFileLoadHook gives different answers 
>>>> depending
>>>> on the timing of the request (derived from original bytecodes vs. 
>>>> JVM-internal
>>>> data)? If we need consistent answers, it may be that the "original 
>>>> bytecode"
>>>> approach needs to reproduce the JVM's inference logic. If it's okay for the
>>>> answers to change, there's less work to do.

>>>> To highlight your last point: we *will* need to work this out for inferred
>>>> IdentityObject, whether we decide to infer ValueObject or not.

Re: Do we even need IO/VO interfaces? (was: JEP update: Value Objects)

Reply via email to