from:"John Rose"

Re: [External] : Re: On tearing

2022-05-16 Thread John Rose



On 27 Apr 2022, at 9:50, Brian Goetz wrote:

…This whole area seems extremely prone to wishful thinking; we hate 
the idea of making something slower than it could be, that we convince 
ourselves that “the user can reason about this.”  Whether or not 
it is “too big a leap”, I think it is a bigger leap than you are 
thinking.


For me, we should make the model clear, the compiler should insert a 
non user overridable default constructor but not more because using a 
primitive class is already an arcane construct.


This might help a little bit, but it is addressing the smaller part of 
the problem (zeroes); we need to address the bigger problem (tearing).


I think I mostly agree with Remi on this point.

A tearable primitive class (call it T-B3 as opposed A-B3 which is 
atomic) can, as you describe, have its invariants broken by races that 
have the effect of writing arbitrary (or almost arbitrary) values into 
fields at any time.


A regular mutable B1 class has a similar problem, except it can be 
defended by a constructor and/or mutator methods that check per-field 
values being stored.  Let’s look at the simplest case (which is rare 
in practice, since it is scary):  Suppose a class has public fields 
which are mutable.  Call such a class a OM-B1 class meaning “open 
mutable B1”.


I think that we can (and probably should) address this educational issue 
by making T-B3 classes look (somehow) like OM-B1 classes.  Then every 
bit of training which leads users to be watchful in their use of OM-B1 
will apply to T-B3 classes.


How to make T-B3 look like OM-B1?  Well, Remi’s idea of a mandated 
open constructor gets most of the way there.  Mandating that the B3 
fields are public is also helpful.  (Records kinda-sorta do that, but 
through component reader methods.)  I truly think those two steps are 
enough, to make it clear to an author of a T-B3 that, if a T-B3 
container is accessible to untrusted parties, then it is free to take on 
any combination of field values at any time.  (And I’m using the word 
“free” here in the rigorous math sense, as in a free product type.)


A further step to nail down the message that the components are 
independently variable would be to provide a reconstructor syntax of 
some sort that amounted to an open invitation to (a) take an instance of 
the T-B3, (b) modify or replace any or all of its field values, and then 
(c ) put it back in the container it came from.  By “open” I mean 
“public to all comers”, which means that every baseline Java 
programmer, who knows about public mutable fields (we can’t cure world 
hunger or negligent Java scribblers), will know that, using that syntax, 
anybody can write anything into any T-B3 value stored in an unprotected 
container.  Just like a OM-B1 object.  Nothing new to see, and all the 
old warnings apply!


We would have to be careful about our messaging about immutability here, 
to prevent folks from mistakenly confusing a T-B3 with an immutable B1 
(I-B1) or B2 (all of which are truly immutable).


One way to do this, that would be blindingly obvious (and IMO too 
blinding), would to (a) allow a `non-final` modifier on fields, 
canceling any implicit immutability property, and (b) *require* 
`non-final` modifiers on all fields in a T-B3 class.  I put this forward 
in the service of brainstorming, to show an extreme (too extreme IMO) 
way to forcibly advertise the T- in T-B3 classes.  But as I said, I 
think in practice it will be enough to make T-B3 classes look like OM-B1 
classes, which are clearly not immutable, even without a `non-final` 
modifier.




I don’t think we have to go so far as to outlaw tearing, but there 
have to be enough cues, at the use and declaration site, that 
something interesting is happening here.


Yes, cues.  And my point above, mainly, is that to the extent such cues 
are available in the world of OM-B1 classes already, we should make use 
of them for T-B3 classes.  And where not, such cues should make it 
really clear that there is an open invitation (public to untrusted 
parties) to make piecemeal edits to the fields of a T-B3 class.




There is no point to nanny people here given that only experts will 
want to play with it.


This is *definitely* wishful thinking.  People will hear that this is 
a tool for performance; 99% of Java developers will convince 
themselves they are experts because, performance!  Developers 
pathologically over-rotate towards whatever the Stack Overflow crowd 
says is faster.  (And so will Copilot.)  So, definitely no.  This 
argument is pure wishful thinking.   (I will admit to being 
occasionally tempted by this argument too, but then I snap out of it.)


I’m with Brian on this.

But we (the EG) can also fail, and make a primitive class too easy to 
use, what scare me is people using primitive class just because it's 
not nullable.


Yes, this is one of the many pitfalls we have to avoid!

This game is hard.


Yep.  Removing null for footprint, by moving

Re: EG meeting, 2022-05-04

2022-05-04 Thread John Rose

On 4 May 2022, at 11:36, Kevin Bourrillion wrote:

>> - "Foo / Foo.ref is a backward default": Kevin and Brian argued that we
>> should prefer treating B3 classes as reference-default, with something like
>> '.val' to opt in to a primitive value type
>>
>
> I will say that I have not personally found the opposition to this change
> to be nearly as strong as the principal arguments in favor. It creates a
> very valuable uniformity in how things work. I hope it goes this way.

(This is hard to parse without that last little sentence.  I think I agree.)

For one thing, you can instantly see, by inspection of the source code,
whether a given variable permits null.

That advantage holds for simple variable declarations, array declarations.
Maybe even with generic type vars.

For another, Integer can just be itself, with Integer.val ≡ int.

Re: [External] : Re: User model stacking

2022-04-30 Thread John Rose

On 27 Apr 2022, at 16:12, Brian Goetz wrote:

> We can divide the VM flattening strategy into three rough categories (would 
> you like some milk with your eclair?):
>
>  - non-flat — use a pointer
>  - full-flat — inline the layout into the enclosing container, access with 
> narrow loads
>  - low-flat — use some combination of atomic operations to cram multiple 
> fields into 64 or 128 bits, access with wide loads

There’s a another kind of strategy here, call it “fat-flat”.  That would 
encompass any hardware and/or software transaction memory mechanism that uses 
storage of more than 64 bits.  I think all such techniques include a fast and 
slow path, which means unpredictable performance.  Such techniques usually 
require “slack” of some sort in the data structure, either otherwise unencoded 
states (like pseudo-oops) or extra words (injected STM headers).  This is not 
completely off the table, because (remember) we are often going to inject an 
extra word just to represent the null state.  In for a penny, in for a pound:  
If we add a word to encode the null state, it can also encode an inflated 
“synchronized access” state.  That’s part of the “VM physics” that Dan is 
asking about.

>
> B1 will always take the non-flat strategy.  Non-volatile B3 that are smaller 
> than some threshold (e.g., full cache line) will prefer the full-flat 
> strategy.  Non-atomic B2 can also pursue the full-flat strategy, but may have 
> an extra field for the null channel.  Atomic B2/B3 may try the low-flat 
> strategy, and fall back to non-flat where necessary.  Volatiles will likely 
> choose non-flat, unless they fit in the CAS window.  But it is always VM’s 
> choice.

A fat-flat strategy can cover atomic B2/B3, even volatiles.

Thing to remember:  Even if a class designer selects the non-atomic option, a 
use-site volatile annotation surely overrides that.  A non-atomic B2 is a funny 
type:  It is usually non-atomic, except for volatile variables.  That suggests 
to me there’s a hole in the user model, a way to select atomic-but-not-volatile 
use sites (variables and array elements, in particular) for non-atomic B2’s.

Re: Abstract class with fields implementing ValueObject

2022-02-09 Thread John Rose

That could be one of very many edge conditions in the JVMS that are not 
diagnosed directly by a validation, but that will eventually cause an error 
when the broken classfile is further used.

I don’t think there needs to be a special rule for this.  We don’t try to 
comprehensively diagnose all “impossible-to-use” classfiles.

On 9 Feb 2022, at 13:50, Frederic Parain wrote:

> There's a weird case that seems to be allowed by the Value Objects JVMS draft:
>
> An abstract class can declare non-static fields, which means it won't
> have the ACC_PERMITS_VALUE flag set, but also declare that it implements
> the ValueObject interface.
>
> The combination looks just wrong, because no class can subclass such class:
>   - identity classes are not allowed because of the presence  of
> the ValueObject interface
>   - value classes are not allowed because of the absence of
> ACC_PERMITS_VALUE
>
> I've looked for a rule that would prohibit such combination in the
> JVMS draft but couldn't find one.
>
> Did I miss something?
>
> Fred

Re: EG meeting, 2022-02-09 [SoV-3: constructor questions]

2022-02-09 Thread John Rose


On 8 Feb 2022, at 19:04, Dan Smith wrote:

"SoV-3: constructor questions": Dan asked about validation for  
and  methods. Answer: JVM doesn't care about  methods in 
abstract classes, the rules about  methods still uncertain.


On the question of JVM validation of `` methods, I’m in favor of 
as few rules as possible, ideally treating `` as just another name. 
 It’s super-power is not in its restrictions but in its 
conventionality:  It’s the obvious choice for constructor factory 
methods.  But it is not necessarily limited to that use.


Maximal limitation would be that a `` method can only occur as the 
translation of a value-class constructor.  Any evidence in the classfile 
that it was not such a translation would be grounds for failing a 
validation check.  We’d make as many such rules as we can think of.


Arguments against:

 - Having a special method identifier in the JVMs without other 
restrictions would be a new thing, and hence suspicious.
 - Limiting the use of `` as much as possible makes it clear, to 
higher layers of the code (javac and reflection) what is going on in the 
class file, as a “reflection” of the source file.
 - Reflection of an irregular (non-source-conforming) `` method 
has to be messy.  (Is it really a constructor?  Or is it just a method 
named ``?)


Arguments in favor:

 - It is a new thing in the JVM for any descriptor to be constrained to 
mention the same name as is the name of the constant pool item referred 
to by `ClassFile.this_class` item (JVMS 4.1).  (It is suspicious.)
 - A maximal limitation would break hidden classes.  (They must 
sometimes return a supertype from their factories, since the HC is not 
always name-able in a descriptor.  HCs only work because the previous 
point.)
 - A limitation might preclude a perhaps-desirable future translation 
strategy that used `` factories uniformly to translate `new` source 
code expressions (identity or value objects, uniformly).
 - A limitation could remove a natural translation strategy for 
“canonical factory methods” in non-concrete types.  This is a 
hypothetical language feature for Java or some other language.  (E.g., 
`new List(a,b,c)` instead of `List.of(a,b,c)`, removing the need of the 
user to remember whether the word was `of` or `make` or `build` or some 
other designer choice.)
 - Most any limitation would preclude ad hoc use of `` factories 
by translation strategies of other languages, such as Scala and Clojure, 
which surely have their own uses of JVM object life cycles.  We want to 
be friendly to non-Java languages.


Compromise positions:
 - Require a `` method to be `ACC_STATIC` but allow for any 
purpose (i.e., any access and any descriptor).
 - Require a `` method to return either the class named by 
`this_class` or some super type (TBD how *this* should be checked).


I would prefer the first compromise:  It’s `static` but otherwise the 
JVM asks no questions.


Regarding reflection, I think it would be OK to surface all of the 
`` methods (of whatever signature) on the `getConstructors` list, 
even if they return “something odd”.  Alternatively, to prevent a 
sharp edge we could have a new list `Class::getFactories`, and *copy* 
(not move) entries from that list onto `getConstructors` exactly when 
the return type matches the enclosing class.  That is a more natural 
move for reflection (which operates on runtime types) than for class 
file structuring (which is more static).


The reason I prefer to require `static` marking is that it would prevent 
the funny name from appearing on the list of regular methods, via 
reflection.

Re: [External] : Re: VM model and aconst_init

2022-01-27 Thread John Rose

On 25 Jan 2022, at 2:50, fo...@univ-mlv.fr wrote:

Let's talk a bit about having the L world and the Q world completely
disjoint at least from the bytecode verifier POV.

Yes please; indeed, that is my strong recommendation. Doing this
(`Q>:interpreter (as well as verifier) to concentrate one “mode” at a
time, for values on stack. This in turn unlocks implementation
possibilities which are not as practical when `Q<:L` (our previous plan
of record).

But keeping Q and L disjoint is a relative notion. Clearly they have to
interoperate *somewhere*. And I believe that “somewhere”, where
Q’s can become L’s and vice versa, is in the dynamic behavior of
bytecodes, not in the type system (in the verifier) which relates
multiple bytecodes.

It means that we need some checkcasts to move in both direction, from
a Q-type to a L-type and vice-versa.

Correct. Checkcast is the mode-change operator (both directions). No
need to create a new bytecode for either direction.

But at the same time, an array of L-type is a subtype of an array of
Q-type ?

That depends on what you mean by “subtype”. The JVM has multiple
type systems. In the verifier we might have a choice of allowing
`Q[]<:L[]` or not. It’s better not. That puts no limitations on what
the `aaload` and `aastore` instructions do. They just dynamically sense
whether the array is flattened or not and DTRT. That has nothing to do
with verifier types. In fact, it’s a separated job of the runtime
type system.

For the runtime type system, we will say that arrays are covariant
across an *extends* which is an augmentation of the *subtype* system.
So for the runtime `T[]<:U[]` iff `T extends U`, so `Q[]<:L[]` (for the
same class in both modes). But in the verifier `Q[]>:like it? Sorry, that’s the best we can do, given Java’s history
with arrays and the weakness of the verifier.

The result to a very uncommon/unconventional type system, and i'm not
a big fan of surprises in that area.

You are looking for a conventional type system involving arrays? Then
surely Java is not where you should be looking! Arrays are always going
to be an embarrassment, but it’s going to be more embarrassing overall
if we let the whole design pivot around the task of minimizing array
embarrassments.

Furthermore, i believe that subtyping is a key to avoid multiple
bytecode verification of the generics code.

I recommend a far simpler technique: Just don’t. Don’t
multiple-verify, and and don’t do anything that would *need* such
extra verification (with different types) to give the same answer (as
the first verification). Also, don’t think of a JVM-loaded set of
bytecodes as every having more than one operational meaning.

(I don’t understand this obsession, not just with you, of imagining
generic bytecodes as statically “recopied” or “recapitulated” or
“stamped out” for each generic instance. I have an uncomfortable
feeling I’m missing something important about that. But I have to
say, that’s not how the JVM will ever work. The JIT is in charge of
managing copies of code paths, and the verifier should not be extended
in any way to support multiple passes or multiple meanings or multiple
copies. Maybe the idea about retyping bytecodes is a hold-over from the
Model 3 experiments, where prototypes were built that elaborated, at
some sort of load time, multiple copies of originally generic bytecodes,
for distinct instances. That was a quagmire in several ways.)

So, let’s not treat generic bytecodes as anything other than
singly-typed by a one-time pass of the verifier. Any attempt to
reverify later from a different POV on just repeats earlier trips into
the quagmire.

Now, the *source code* can be *imagined* to be re-typed in generic
instances; that what generic code *means*. But don’t confuse the
*implementation* of generics from the *semantic model*. The
implementation will (in our plan of record) use something the verifier
knows nothing about: Dynamic type restrictions (or something
equivalent) which manage the projections and embeddings of specific
instance types into the generic bound type. These *do not have to be*
(indeed *must not be*, see below) identified as verifier types. They
are associations of pairs of JVM runtime types.

By example, with the TypeRestriction attribute [1], the restriction
has to be subtype of the declared type/descriptor.

No, just no. (Not your fault; I suppose the TR docs are in a confusing
state.) There’s no reason to make that restriction (linking TRs to
the verifier type system), other than the futile obsession I called out
above. And if you do you make yourself sorry when you remember about
the verifier’s I/J/F/D types. And for many other reasons, such as
instance types which are inaccessible to the generic code. And the
simple fact that bytecodes are not really generic-capable, directly,
without serious (and futile) redesign.

Making bytecodes

Re: The interfaces IdentityObject and ValueObject must die !

2022-01-26 Thread John Rose


On 26 Jan 2022, at 16:36, Dan Smith wrote:

An instance of a class is also an instance of (and carries the 
properties of) its superclasses. Value objects are instances of the 
class Object.


I can imagine a design in which we say that instances of Object may be 
either identity or value objects, but direct instances of the class 
are always identity objects. But this is not how we've handled the 
property anywhere else, and it breaks some invariants. We've gotten 
where we are because it seemed less disruptive to introduce a subclass 
of Object that can behave like a normal identity class.


And yet there is also a second way a class can be an instance; it can be 
*exactly an instance of C*, when `x.getClass()==C.class`.  That’s the 
condition which can be teased apart here, if we allow ourselves to use 
something other than marker interfaces.  But marker interfaces (as I 
said) are committed to ignoring the “exactly an instance” condition, 
because they inherit.

Re: The interfaces IdentityObject and ValueObject must die !

2022-01-26 Thread John Rose

[Sorry disregard last content-free message.  Still getting used to a new 
mail client.]


On 26 Jan 2022, at 7:42, Dan Smith wrote:

If we do not use interfaces, the runtime class of java.lang.Object can 
be Object, being an identity class or not is a just a bit in the 
reified class, not a compile time property, there is contamination by 
inheritance.


Object can't be an identity class, at compile time or run time, 
because some subclasses of Object are value classes.


That’s true, but stripping away the marker interfaces removes (one 
part of) the contract that a class, as a whole, must always accurately 
report whether *all its instances* have one or the other property, of 
having identity or having no identity.


As I said in an earlier meeting, there are sometimes reasons to give 
*the same class* both value instances and identity instances.  Yes this 
muddies the user model but it also helps us with compatibility moves 
that are required, however much they muddy the user mode.  (Actually all 
exact instances of `Object` will be identity objects.  I’m thinking of 
`Integer` which might want mostly values but some identity objects for 
backward compatibility.  Maybe.)


Independently of that, for the specific case of `Object`, having a query 
function `Class.instanceKind`, which returns “NONE” for abstracts 
else “VALUE” or “IDENTITY”, would encode the same information we 
are looking at with those marker interfaces.  But the contract for a 
method is *more flexible* than the contract of a marker interface.


In particular, `instanceKind` is not required to report the same thing 
for T and U when T<:U but marker interfaces are forced to be consistent 
across T<:U.  I think this is an advantage, precisely because it has 
more flexible structure, for the method rather than the marker 
interface.


If the marker interfaces also have little use as textual types (e.g., 
for bounds and method parameters) then I agree with Remi.  Ditch ‘em.

Re: The interfaces IdentityObject and ValueObject must die !

2022-01-26 Thread John Rose




On 26 Jan 2022, at 7:42, Dan Smith wrote:

On Jan 26, 2022, at 2:18 AM, 
fo...@univ-mlv.fr wrote:


In other words: I don't see a use case for distinguishing between 
primitive and

value classes with different interfaces.

Primitive classes does not allow nulls and are tearable, following 
your logic, there should be a subclass of ValueObject named 
PrimitiveObject that reflects that semantics.


But this isn't a property of the *class*, it's a property of the 
*type*, as used at a particular use site. If you want to know whether 
an array is flattened, the class of the component can't tell you.


This is especially useful when you have an array of PrimitiveObject, 
you know that a storing null in an array of PrimitiveObject will 
always generate a NPE at runtime and that you may have to use either 
the volatile semantics or a lock when you read/write values from/to 
the array of PrimitiveObject.


For examples,
 public void m(PrimitiveObject[] array, int index) {
   array[index] = null;  // can be a compile time error
 }

If we said

class Point implements PrimitiveObject

then it would be the case that

Point.ref[] <: PrimitiveObject[]

and so PrimitiveObject[] wouldn't mean what you want it to mean.

We could make a special rule that says primitive types are subtypes of 
a special interface, even though their class does not implement that 
interface. But that doesn't really work, either—primitive types are 
monomorphic. If you've got a variable with an interface type, you've 
got a reference.


We could also make a special rule that says arrays of primitive types 
implement an interface PrimitiveArray. More generally, we've 
considered enhancements to arrays where there are different 
implementations provided by different classes. That seems plausible, 
but it's orthogonal to the IdentityObject/ValueObject feature.


Meanwhile, I'd suggest writing the method like this, using universal 
generics:


  public void m(T[] array, int index) {
   array[index] = null;  // null warning
 }

An impossible type, it's a type that can be declared but no class will 
ever match.


Examples of impossible types, at declaration site
 interface I extends ValueObject {}
 interface J extends IdentityObject {}
  void foo() { }

It would definitely be illegal to declare a class that extends I and 
J. Our rules about well-formedness for bounds have always been 
sketchy, but potentially that would be a malformed type variable.


Abandoning the property entirely would be a bigger deal.

If we do not use interfaces, the runtime class of java.lang.Object can 
be Object, being an identity class or not is a just a bit in the 
reified class, not a compile time property, there is contamination by 
inheritance.


Object can't be an identity class, at compile time or run time, 
because some subclasses of Object are value classes.


What you'd need is a property of individual *objects*, not represented 
at all with the class. Theoretically possible, but like I said, a 
pretty big disruption to our current model.


For me, it's like opening the door of your house to an elephant 
because it has a nice hat and saying you will fix that with 
scotch-tape each time it touches something.


Ha. This sounds like maybe there's a French idiom involved, but anyway 
we should try to get John to add this to his repertoire of analogies.

Re: The interfaces IdentityObject and ValueObject must die !

2022-01-26 Thread John Rose

On 26 Jan 2022, at 7:42, Dan Smith wrote:

> For me, it's like opening the door of your house to an elephant because it 
> has a nice hat and saying you will fix that with scotch-tape each time it 
> touches something.
>
> Ha. This sounds like maybe there's a French idiom involved, but anyway we 
> should try to get John to add this to his repertoire of analogies.

Is this elephant also being followed around a crowd of blind men?  Too bad they 
can’t see its nice hat.

See also:  A bull walked into a china shop.  The owner said, “nice hat!”

Re: [External] : Re: VM model and aconst_init

2022-01-19 Thread John Rose

On 12 Jan 2022, at 8:45, fo...@univ-mlv.fr wrote:

>> From: "Brian Goetz" 
>> To: "Remi Forax" 
>> Cc: "valhalla-spec-experts" 
>> Sent: Wednesday, January 12, 2022 2:30:00 PM
>> Subject: Re: [External] : Re: VM model and aconst_init
>
>> The operand of C_Class is a weird beast. It can be an internal name
>> (com/foo/Bar), but it can also be a *descriptor* for an array type. Valhalla
>> extends it to allow Q descriptors as well (but not L descriptors --
>> there
>> should be one way to say C_Class[String].)
> Your explanation maks sense but it's not was this sentence says
> """
> The sole operand of this bytecode is a reference to a CONSTANT_Class item 
> giving the internal binary name of the value class (not its Q descriptor)
> """
>

That was… aspirational.

Re: VM model and aconst_init

2022-01-19 Thread John Rose


On 12 Jan 2022, at 5:14, fo...@univ-mlv.fr wrote:

Ok, but in that case how the verifiers know if aconst_init generate a 
Q-type or a L-type given that aconst_init takes a CONSTANT_CLASS and 
not a descriptor as parameter.


In the terms of my previous message, the `CONSTANT_Class` item
in the CP must be “modal”, refer to either an L-type or Q-type.
That’s true whether it is the direct operand of `aconst_init`
or a sub-operand of `withfield`.

All of this (about pervasive use of bimodal `C_Class` items) is
hard to avoid.  I would have preferred to say that `C_Class` items
are always for L-types, but that runs afoul of other requirements,
notably that Point.class be the mirror for Q-Point not L-Point
if Point is a B3-capable class.

Re: VM model and aconst_init

2022-01-19 Thread John Rose


I think (based on our most recent conversations)
that `aconst_init` can return a Q-type for B3 types
and an L-type for B2 types.  And likewise for the input
and output of `withfield`.

The net result is that both bytecodes need to be
permissive about L and Q types, because B2 and B3
translation strategies require distinct parallel use
cases.

This is probably not clear in the docs, but I think
it makes sense.

Can you mix and match both modes in the same method?
Probably, since the interpreter doesn’t care about
multi-bytecode patterns.  Dunno if this causes a testing
problem, and if so how to fix it.  I think it’s probably
OK, especially if we require the two-way checkcast
(Q-Foo not a subtype of L-Foo in the verifier) so that
each mode stays “in its own lane”.

More explicitly, this is a set of use cases for using
Q-types in C_Class entries in the constant pool to switch
to Q-mode for bytecodes that refer to classes, including
`withfield` and `aconst_init`.

On 12 Jan 2022, at 4:31, Remi Forax wrote:


I've some troubles to wrap my head around those two sentences

"""
aconst_init is the analogue of new for value objects; it leaves a 
reference to the initial value for a value class on the stack. This 
initial value is guaranteed to not be equal to null. The sole operand 
of this bytecode is a reference to a CONSTANT_Class item giving the 
internal binary name of the value class (not its Q descriptor).

"""

and
"""
Both withfield and aconst_init return a Q type if and only if their 
class is a primitive class.

"""

The second is ambiguous because it's not clear if aconst_init can 
return a L-type. I suppose it can not but this is not clear at all.


If this is the case, what is the use case for withfield taking a 
L-type as parameter ??


regards,
Rémi

Re: Terminology bikeshedding summary

2022-01-12 Thread John Rose

#3 is more like:

primitive value vs. bare object
 Or
primitive value vs. [something else] object

Where object is split not only into value object and identity objects, but bare 
(value) objects are another split. You get bare vs heap just as you get value 
vs identity as cleavage planes in the universe of objects.

Also:

Legacy primitives would not be objects in any case. But we can mock up classes 
to wrap and/or emulate them and even declare in the user model that these very 
special classes in some sense “are” the primitives. 

> On Jan 12, 2022, at 3:27 PM, Dan Smith  wrote:
> 
> #3:
> 
> primitive value vs. object

Re: Updated State of Valhalla documents

2022-01-05 Thread John Rose

> On Jan 5, 2022, at 4:45 PM, Dan Smith  wrote:
> 
> Not talking about the VM. I'm talking about the language model.
> 
>> A primitive (B3) does not provide proper encapsulation unlike a classical 
>> Java class (the one spelt "class" in the language),
> 
> You should say "object" here, not "class". Primitive values have classes, 
> even though they are not objects.

Yes. And what’s more, Remi’s point about encapsulation is weak, because we can 
(possibly) assume that every author of a primitive class has checked those 
boxes off, saying that all-zero default is a valid value and tearing is 
acceptable. There are plenty of Java B1 classes today that are designed with 
such weaknesses. Class abstractions come in various strengths as selected by 
each class’s author. Selecting primitive for a class forces the author to gives 
up some abstraction but keeps most abstraction decisions intact. 

Having the required hardwired null-arg constructor syntactixally present is an 
interesting idea to ensure that the author has explicitly “checked the box” 
about the default value. Not sure it’s worth it though.

Re: [External] : Re: Updated State of Valhalla documents

2021-12-23 Thread John Rose


On 23 Dec 2021, at 11:26, fo...@univ-mlv.fr wrote:

For "value", we know that we want value class and value record, so 
it's more like a modifier.
For primitive, do we want a primitive record ? The VM supports it, but 
do we want to offer that possibility in Java ?
My gut feeling is that the answer is "No" because of what Kevin said 
earlier, we should drive users to use value classes instead of 
primitives.


Good points, though not sure if they carry the decision completely the 
other way.  The VM sees primitive as a classfile modifier.  (The 
`ACC_PRIMITIVE` modifier flag!)  You are raising the question of whether 
this is smart for the language as well.  For further discussion and 
perhaps experimentation.

Re: Updated State of Valhalla documents

2021-12-23 Thread John Rose

On Dec 23, 2021, at 10:35 AM, Remi Forax  wrote:

From: "Brian Goetz" 
To: "valhalla-spec-experts" 
Sent: Thursday, December 23, 2021 6:14:43 PM
Subject: Updated State of Valhalla documents
Just in time for Christmas, the latest State of Valhalla is available!

https://openjdk.java.net/projects/valhalla/design-notes/state-of-valhalla/01-background
https://openjdk.java.net/projects/valhalla/design-notes/state-of-valhalla/02-object-model
https://openjdk.java.net/projects/valhalla/design-notes/state-of-valhalla/03-vm-model

The main focus for the last year has been finding the right way to expose the 
Valhalla features in the user model, in a way that is cleanly factored, 
intuitive, and clearly connects with where the platform has come from.  I am 
very pleased with where this has landed.

There are several more installments in the works, but these should give plenty 
to chew on for now!

I've done a rapid reading,
in the objec-model
   primitive class Point implements Serializable

should be
   primitive Point implements Serializable

"value" is a modifier but "primitive" is a top level type.

I call bike shed on that!  Since a primitive class file defines two types we 
have a choice in how to convey that in the source notation. This may evolve 
further of course and even to the place you suggest.

The design in part 3 is cool, because if i'm not mistaken, you can implement 
value classes without the support of Qtype in the classfile.

Thank you. That is correct!  This is a big result of the refactoring work, and 
to a lower total complexity.

Rémi

Re: We have to talk about "primitive".

2021-12-15 Thread John Rose


On 15 Dec 2021, at 10:42, Kevin Bourrillion wrote:


…
The main problem I think we can't escape is that we'll still need some 
word
that means only the eight predefined types. (For the sake of argument 
let's
assume we can pick one and lean hard on it, whether that's 
"predefined",

"built-in", "elemental", "leaf type", or whatever.)


As others have said, we’ll pick a term for this.  The idea of calling 
out a “leaf” in a data graph is compelling to me.  As you say, 
people are going to wonder what is the foundation of the whole scheme.  
(No it’s not objects all the way down, at least that’s not what we 
are aiming for.)


(But—spoiler alert—the division between leaf/scalar/basic type and 
composite/class type is *less important in daily practice* than the ad 
hoc mental models programmers make about which types they choose to view 
as composite and which are indivisible.  Typical example:  Most 
programmers choose to regard `String` as a sort of nullable primitive.  
I’ll pick up that thread later.)


I like the term “basic type”, and (as we already discussed) I like 
“scalar” also, because “scalar” correctly suggests something 
about how it’s processed in hardware.


Here’s a point I think is also important and has not been discussed 
much yet:  A concept like “basic type” (or “scalar type”) should 
include references as well as Java’s eight current primitive types.  
Like an `int` or other basic primitive, a reference is copied by value, 
processed efficiently (probably in a hardware register), and is a 
“leaf item” with respect to a single object layout or method type 
signature.  Also, like `int`, a reference has its own special operators 
in the language and special bytecodes in the JVM.  Like `int`, it has a 
default value `null` (instead of `0`).


The main difference of a reference from an `int` is the fact that it has 
a far end:  You can often (not always) find other values by indirecting 
the reference and loading a field or calling a method or querying a 
super type.  (Because it has a far end, it also has a nominal subtype to 
classify what might be at the far end.  But I’m speaking here about 
references per se, apart from their subtypes.)  Despite their “far 
end”, people treat some reference types, like `String`, as if they 
were leaves; you stop at the `String` and don’t bother thinking about 
its fields.  Users don’t care that there’s an array somewhere on the 
other end, unless they are engineering the string class itself.  So a 
reference has a far end, unlike an `int`, but, like an `int`, a 
reference *often* is treated like an unstructured value, in code.


Bottom line:  There are a handful of built-in basic types.  These are 
used to compose classes.  They are the primitives and the references.  
When we consider a reference apart from its class (say, as `jl.Object`), 
it can be comfortably called a *basic type*, and then that handful of 
built-in basic types consists of the (basic) primitives and references.


OK, that’s enough on that.  Whether “reference” is a basic type is 
less important than how we choose to extend (or not extend) the reach of 
the term “primitive”.


For historic reasons we use the word ~~fruit~~ *primitive* to mean a 
basic type other than a reference.  Now that we have user-defined 
`int`-like things, we have to decide whether and how to connect the old 
word to the new things.  Since user-defined `int`-like things are (we 
think) very like `int` in many ways, a term like “extended 
primitive” makes sense.


This is how I get to the terms “basic primitive” and “extended 
primitive”.  Or “scalar primitive” and “extended primitive”.


As I read your messages, you would prefer to keep the term 
“primitive” narrow, because of the possible confusion of telling 
users “hey, what you think of as primitives are now the ~~heirloom~~ 
basic primitives.”  Personally, I think users will say, to our 
unveiling “extended primitives”, something like this:


Well, that’s not exactly what the dictionary says primitive means, 
if you can make new composite ones.  But I do know that Java has 
non-reference types and calls them “primitive”.  And I also know 
it would be really cool to define new types that work like `int`, 
such as `UnsignedInt` or `HalfFloat` or the like.  I get why they 
don’t want to build all such types into the language; in fact maybe 
I’d like to try my hand someday at defining my own.  So, 
“extended primitive”.  It’s on:  The Java primitives are now an 
open-ended set just like the Java objects.


In other words, in saying “extended primitive” (and also “basic 
primitive”) we lean away from the dictionary definition of 
“primitive” and into the Java definition.  That feels like a 
non-confusing choice to me.




Definitely, our trying to minimize their specialness is virtuous.


Yep.  We also call this “healing the rift”, sometimes.


…
So we have to attempt to shift users' understanding of "primitive" 
while at
the same time injecting a new term

Re: [External] : Re: We have to talk about "primitive".

2021-12-15 Thread John Rose


On 15 Dec 2021, at 15:06, Brian Goetz wrote:


It took us a while to unravel this one, but I think we did.
… What this says is that tearing/non-tearing is a property of 
reference-vs-primitive-ness; accessing a (fat) value through a 
reference gives you *more guarantees* than accessing it directly. 
(Correspondingly, this has more costs.)


All of this is to say, as I think you are saying: primitives of a 
certain size were always tearable, and they still are; references 
never were, and they are still not.


Of course references don’t tear, and more to the point, `final` fields 
reached by references also don’t tear, because they are (a) safely 
published and (b) never mutated after publication.  So, as Brian says, 
wrapping a reference around some chunks of state has a special benefit 
(as well as a special cost).  The reference wrapper freezes those chunks 
in place, relative to each other.

Re: [External] : Re: basic conceptual model

2021-12-13 Thread John Rose

On 13 Dec 2021, at 19:05, Kevin Bourrillion wrote:

> …
> Yes, in general I am sure that I can't accomplish actual ground up
> non-cyclical definition-definitions here. I think it should suffice to be
> descriptive enough for the reader to course-correct their previous notions
> in this direction (provided they want to).

Yup, I see that’s how it’s working in there.
>
>
>> Saying “unit” is more mysterious.  You certainly don’t mean units of
>> measure, or functional programming unit types.  Are you meaning to imply
>> that it has no subparts which might also be termed units?
>
>
> Oh, I actually do not want to imply irreducibility at all. That all values
> have had that property in Java is a fact I would label as
> incidental-not-essential.,
>
> Glob, gob, blob, hunk, chunk, piece, .

In that case I claim unit has the wrong connotation, since it does (often) come 
with an expectation of irreducibility.  With that in mind I like the unassuming 
term “piece”, or those other words.  If you are still in thesaurus mode:

https://www.thesaurus.com/browse/portion

>
>
>
>> That’s OK as long as you have today’s primitives (which I like to call
>> “scalar primitives”) and of course references (which are also scalars).  By
>> “scalar” I mean an item of data that is not composed of further scalars.
>>
>
> A tangent, but there's enough math major still in me to object to this. :-)
> Scalars are scalar because they scale things! This would be more similar to
> a one-dimensional vector space than to a scalar  imho the best
> adjective for today's primitives is "primitive" and I'll plead my case
> about that soon too. :-)

Sure, that’s a good position for math majors like you and me.  And I’m sure 
you/they/we really squirm in the presence of discussions about “vector 
processing units” and “vector ISAs”.  But the squirm-worthy folks that define 
VPUs also use the term “scalar” to mean “the value that’s in a vector lane”, 
and they assuredly do not mean that “scalar” can be identified with 
“single-lane vector”.

Re: basic conceptual model

2021-12-13 Thread John Rose

Two more thoughts:  You could get away with saying “indivisible unit”; I think 
that would convey much of what you mean.  Also, a footnote drawing the reader’s 
attention to native hardware types (long, byte, float, reference) would make it 
clear that a Java computation is meant to “bottom out” in operations on units 
of data familiar to assembly programmers.  They are indivisible units, but even 
more important, their operations are natural to real computers.

On 13 Dec 2021, at 18:40, John Rose wrote:

> I have some comments.  Since the doc invites directly stuck-on comments, I’ve 
> requested edit permission, as that seems necessary for me to stick on a 
> comment.
>
> Some free-floating notes:
>
> Good use of “freely copyable” as a concept.  There’s a tough case, happily 
> not relevant to Java, of linear types (IIRC Rust has them) where a value is 
> freely copyable, but only to the extent that the source forgets the value 
> after the sink gets it.  Accounting for that would stress your terminology.
>
> Another (more subtle) stress to your terminology is your assertion that a 
> mutable variable “forgets” the previous value when a new value is stored.  
> That isn’t strictly correct in the case of race conditions.  Only a volatile 
> variable reliably “forgets” its previous value in the presence of races.
>
> You don’t actually define the term “value” but just illustrate it and make 
> claims about it.  Maybe you have to do it that way…  Actually, you say it’s 
> “unit of data”.  Referring to “data” as a known term (for readers who are 
> programmers) is OK.
>
> Saying “unit” is more mysterious.  You certainly don’t mean units of measure, 
> or functional programming unit types.  Are you meaning to imply that it has 
> no subparts which might also be termed units?  That’s OK as long as you have 
> today’s primitives (which I like to call “scalar primitives”) and of course 
> references (which are also scalars).  By “scalar” I mean an item of data that 
> is not composed of further scalars.

Re: basic conceptual model

2021-12-13 Thread John Rose

I have some comments.  Since the doc invites directly stuck-on comments, I’ve 
requested edit permission, as that seems necessary for me to stick on a comment.

Some free-floating notes:

Good use of “freely copyable” as a concept.  There’s a tough case, happily not 
relevant to Java, of linear types (IIRC Rust has them) where a value is freely 
copyable, but only to the extent that the source forgets the value after the 
sink gets it.  Accounting for that would stress your terminology.

Another (more subtle) stress to your terminology is your assertion that a 
mutable variable “forgets” the previous value when a new value is stored.  That 
isn’t strictly correct in the case of race conditions.  Only a volatile 
variable reliably “forgets” its previous value in the presence of races.

You don’t actually define the term “value” but just illustrate it and make 
claims about it.  Maybe you have to do it that way…  Actually, you say it’s 
“unit of data”.  Referring to “data” as a known term (for readers who are 
programmers) is OK.

Saying “unit” is more mysterious.  You certainly don’t mean units of measure, 
or functional programming unit types.  Are you meaning to imply that it has no 
subparts which might also be termed units?  That’s OK as long as you have 
today’s primitives (which I like to call “scalar primitives”) and of course 
references (which are also scalars).  By “scalar” I mean an item of data that 
is not composed of further scalars.

Re: [External] : Re: Proposal: Static/final constructors for bucket-3 primitive classes.

2021-12-09 Thread John Rose


On 9 Dec 2021, at 7:25, fo...@univ-mlv.fr wrote:

We may do something like that in a possible future, but i think it's 
more important to make the semantics of B3 visible front and center.


If you can only say one thing in such an explicit no-arg constructor 
(true initially and maybe forever) then it surely is strange that the 
silly thing has a body.  So that leads to some un-bodied presentation 
like `class P { public default P(); }`, which could be made more 
expressive later (or never, probably).


But that, in turn, hits near to one of the places where Java *already 
set the default* (rightly or wrongly). Java defines, under some 
circumstances, the no-arg constructor for a class implicitly.  Arguably 
this precedent applies (though not exactly) to the current case, of 
default construction of the default value.


I think, in the end, making a new primitive (as opposed to a new value 
class) is going to be an activity for library experts, not end users.  
Maybe the IDEs (not the JLS) can help them avoid pitfalls, but 
primitives are inherently tricky things to define.  This means either 
that (a) it’s OK to force the experts to do the extra ceremony’s or 
(b) it’s OK to assume they know the rules of that game, and the 
ceremony won’t add anything.  I incline towards (b).


The vision I’m assuming here is that a _bare primitive_ is something 
inherently loosely assembled.  It’s really just a bundle of scalar 
values.  If you want a class wrapped around that bundle, you should be 
declaring your value as a _primitive reference_ (assuming the option for 
the bare primitive must also be provided) or declaring your type as a 
true _value class_ (if the option for the bare primitive is not so 
important).


P.S. A friend kindly helped me update my metaphor firmware.  I meant to 
say that pushing the feature under discussion would lead us along a path 
of pain, with various experiences along the way.  But obviously not 
existential Jacksonian pain.  And that’s all I want to say here about 
that.

Re: [External] : Re: Proposal: Static/final constructors for bucket-3 primitive classes.

2021-12-09 Thread John Rose

On Dec 8, 2021, at 11:12 PM, Remi Forax  wrote:
> 
> I fully agree, i think it's better to do the opposite

I snapped a few neurons trying to read that the first time. 

> and force the fact that all primitive value classes (Bucket 3) must have a 
> default constructor and that constructor have a fixed bytecode instructions.

Heavy on ceremony even for Java especially if you can’t do anything valuable in 
the constructor body. 
> 
> If a user does not provide a constructor without parameter, the compiler will 
> provide one and the verifier will check that this constructor exist.

That’s JVM ceremony, to what end?

Maybe we should disallow no-arg constructors altogether and leave room for a 
possible future feature along the lines of the special init phase. That future 
feature would run ad hoc byte codes at class preparation time to build thyroid 
default value and would throw an error if it touched the class. Kind of like 
superclass init actions; after those and before the proper clinit call. 

It’s possible but not a priority, because of the various expenses I sketched. 
So we could leave space for it to put in later if the costs were justified 
after all.

Re: Proposal: Static/final constructors for bucket-3 primitive classes.

2021-12-08 Thread John Rose

We have considered, at various points in the last six years or more, 
allowing user-defined primitive types to define (under user control) 
their own default values.  The syntax is unimportant, but the concept is 
simple:  Surely the user who defines a primitive type can also define 
default initializer expressions for each of the fields.


But this would be a trail of tears, which we have chosen to avoid, each 
time the suggestion comes up.


This feature is often visualized as a predefined bit pattern, which the 
JVM would keep handy, and just stamp down wherever a default initializer 
is needed.  It’s can’t really be that simple, but even such a bit 
pattern is problematic.


First of all is the problem of declaring the bit pattern.  Java natively 
uses the side effects of `` to define constants using ad hoc 
bytecodes; it also defines (for some types but not others) a concept of 
constant expression.  Neither of those fits well into a classfile that 
would define a primitive with a default bit pattern.


If the bit pattern is defined using ad hoc bytecode, it must be defined 
in a new pseudo-method (not ``), to execute not *during* the 
initialization of the newly-declared primitive class, but *before*.  
(Surely not! a reader might exclaim, but this is the sort of subtlety we 
have to deal with.)  During initialization of a class C, all fields of 
its own type C must be initialized *before* the first bytecode of 
`` executes, so that the static initializer code has something 
to write on.  So there must be a “default value definition” phase, 
call it ``, added after linking and before 
initialization of C, so C’s `` method has something to work 
with.  This `` is really the body of a no-argument 
constructor of C, or its twin.  A no-argument constructor of C is not a 
problem, but having it execute before C’s `` block is a huge 
irregularity, which the JVM spec is not organized to support, at 
present.


This would turn into both JVMS and JLS spec. complexity, and more odd 
corners (and odd states) in the Java user experience.  Sure, a user will 
say, “but I promise not to do anything odd; I just want *this field* 
to be the value `(int)1`”.  Yes, but a spec. must define not only the 
expected usages, but all possible usages, with no poorly-defined states.


OK, so if `` is not the place to define to define this 
elusive bit pattern, what about something more declarative, like a 
`ConstantValue` attribute?  Surely we could put a similarly structured 
`DefaultValue` attribute on every non-static field of a value type, and 
that would give the JVM enough information to synthesize the required 
bit pattern *before* it runs ``.


Consider the user model here:  A primitive declaration would allow its 
fields to have non-zero default values, *but only drawn from the 
restricted set of constant expressions*, because those are the ones 
which fit in the `ConstantValue` attribute.  (They are true bit patterns 
in the constant pool, plus `String` constants.)  There is no previous 
place in Java where we make such a restriction, except `case` labels.  
Can you hear the groans of users as we try to explain why only constant 
expressions are allowed in that context?  That’s the muzak of the 
trail of tears I mentioned above.


But we have condy to fix that (someone will surely say).  But that’s 
problematic, because the resolution of constant pool constants of a 
class C requires C to be at least linked, and if the condy expression 
makes a self-reference to C itself, that will trigger C’s 
initialization, at an awkward moment.  Have you ever debugged a tangled 
initialization circularity, marked by mysterious NPEs on variables you 
*know* you initialized?  I have.  It’s a stop on the trail of tears I 
mentioned.


But if we really worked hard, and added a bunch of stuff to the JVMS and 
JLS, and persuaded users not to bother us about the odd restrictions (to 
constant expressions, or expressions which “don’t touch the class 
itself”), we *could* define some sort of declarative default value 
initialization.


What then?  Well, ask the JVM engineers how they initialize heap 
variables, because those are the affected paths.  Those parts of the JVM 
are among the most performance-sensitive.  Currently, when a new object 
or array is created, its whole body (except the header) is sprayed with 
a nice even coat of all-zero-bit machine words.  This is pretty fast, 
and it’s important to keep it fast.  What if creating an array 
required painting some beautifully crafted arabesque of a bit pattern 
defined by a creative user?  Well, it’s doable, but much more 
complicated.  You need to load the bit pattern into live registers and 
(if it’s an array of C) keep them live while you paint the whole 
array.  That’s got to be more expensive than spraying zeroes.  
(There’s even hardware that’s good for spraying zeroes, on some 
machines.)  Basically, if we generously allowed users even a limited set 
of pre-defined default

Re: [External] : Re: JEP update: Value Objects

2021-12-01 Thread John Rose

On Dec 1, 2021, at 3:56 PM, John Rose 
mailto:john.r.r...@oracle.com>> wrote:

There is the converse problem that comes from the redundancy:
What happens if the class directly implements or inherits ValueObject
and ACC_VALUE is not set?  I guess that is an error also.

I hit send too soon:  That’s probably true for concrete classes.
For abstracts, ACC_VALUE must not be set (yes?) and ValueObject
“just flows” along with all the other super types, with no particular
notice.  It all comes together when ACC_VALUE appears, and that
must be on a final, concrete class.

I keep wondering what ACC_VALUE “should mean” for an abstract.
Maybe it “should mean” that the abstract is thereby also forced to
implement VO, so that all subtypes will be VO’s.

The slightly different meaning of ACC_PERMITS_VALUE is “hold
off on injecting IdentityObject at this point”.  Because the type
might allow subtypes that implement VO (whether abstract or
concrete).  At this point it also allows IdentityObject to be
introduced in subtypes.  Mmm… It could also have been
spelled ACC_NOT_NECESSARILY_IDENTITY.

As we said in the meeting, it seems to need magic injection of
IdObj, even if we can require non-magic explicit presence of VO.
Dan H., will the metadata pointer of IdObj be a problem to access,
if it is magically injected?

Re: [External] : Re: JEP update: Value Objects

2021-12-01 Thread John Rose

On Dec 1, 2021, at 3:29 PM, Dan Smith 
mailto:daniel.sm...@oracle.com>> wrote:

So we went down the path of "maybe there's no need for a flag at all" in 
today's meeting, and it might be worth more consideration, but I convinced 
myself that the ACC_VALUE flag serves a useful purpose for validation and 
clarifying intent that can't be reproduced by a "directly/indirectly extends 
ValueObject" test.

As you suggest, though, we could mandate that ACC_VALUE implies 'implements 
ValueObject’.

Assuming ACC_VALUE is part of the design, there are actually four
things we can specify, for the case when a class file has ACC_VALUE set:

A. Inject ValueObject as a direct interface, whether or not it was already 
inherited.
B. Inject ValueObject as a direct interface, if  it is not already inherited.
C. Require ValueObject to be present as a direct interface, whether or not it 
was already inherited.
D. Require ValueObject to be present as an interface, either direct or 
inherited.

A and B will look magic to reflection.
B is slightly more parsimonious and less predictable than A.
C and D are less magic to reflection, and require a bit more “ceremony” in the 
class file.
D is less ceremony than C.
Also, the D condition is a normal subtype condition, while the C condition is 
unusual to the JVM.

I guess I prefer C and D over A and B because of the reflection magic problem,
and also because of Dan H’s issue (IIUC) about “where do we look for the
metadata, if not in somebody’s constant pool?”

Since D and C have about equal practical effect, and D is both simpler to
specify and less ceremony, I prefer D best of all.

I agree that ACC_VALUE is useful to prevent “action at a distance”.

There is the converse problem that comes from the redundancy:
What happens if the class directly implements or inherits ValueObject
and ACC_VALUE is not set?  I guess that is an error also.

— John

Re: aconst_init

2021-12-01 Thread John Rose

On Dec 1, 2021, at 7:58 AM, Dan Heidinga  wrote:
> 
> Splitting a new thread off from Dan's email about the jep draft to
> talk about the `aconst_init` bytecode:
> 
>> aconst_init, with a CONSTANT_Class operand, produces an instance of the 
>> named value class, with all fields set to their default values. This 
>> operation always has private access: a linkage error occurs if anyone other 
>> than the value class or its nestmates attempts an aconst_init operation.
> 
> Can you confirm if this is purely a rename of the previous
> defaultvalue / initialvalue bytecodes?

I can confirm this, with one important exception:   The defaultvalue
bytecode has no access restrictions, while the aconst_init/initialvalue
bytecode does.

> I'm wondering how the name fits the eventual primitive values and
> their uses.  Will they also use this bytecode or will they continue to
> use a defaultvalue version?

For this reason, aconst_init/initialvalue is not useful for B3 types.
I think there is no need for yet another bytecode to cover the B3
types.  Instead, Class::__InitialValue should return either null for
B1/B2 types (or any reference types: polys and arrays), and should
return the (boxed) zero for primitives, starting with int.class.

assert Integer.class.__InitialValue() == null;
assert int.class.__InitialValue() == 0;
assert Point.class.__InitialValue() == (new Point[1])[0];
assert Point.ref.class.__InitialValue() == null;

(__InitialValue is not really the eventual method name.)

> The expected bytecode pattern for a "" factory method is something like:
>  aconst_init MyValue
>  iconst1
>  withfield MyValue.x:I
>  areturn
> Correct?

Yes, although it’s likely there are intervening astore_0 and
aload_0 instructions, since ’this’ is probably modeled by the
compiler as local[0].

By the way, this raises the question of how vigorously
the JVM should perform structural checks on the new
features, to ensure they are only used in the ways we
expect.  I think in general such checks should be
justified individually, rather than be applied by default.

Since  is just a static factory method, I would prefer
(though I understand reasons to the contrary) to have the
JVMS be agnostic about where  methods can occur.
In other words, treat  like a plain identifier; maybe
require that it be marked ACC_STATIC but allow it to
work like a nameless factory method in any context
where a classfile generator might choose to make use of it.

Taking an agnostic stance now would let us experiment
with translation strategies (in the future) which replace
uses of  (which have problematic security
characteristics, even recently) with uses of .

(Reflection might omit off-label uses of , just
like it omits .  But the “guts” of MH reflection
can see  today and would see all such s
tomorrow, so exposing it becomes a library issue,
not a JVMS decision.)

— John

Re: JEP update: Value Objects

2021-11-29 Thread John Rose

P.S. I’d like to emphasize that none of my pleas for caution apply to the
JEP draft titled Value Objects.

That very nice JEP draft merely links to the JEP draft titled Primitive Classes,
which is the JEP with the potential problem I’m taking pains to point out here.

Also, I’m not really demanding a title change here, Dan, but rather asking
everyone to be careful about any presupposition that “of course we will
heal the rift by making all primitives be classes”. Or even “all primitives
be objects.” Those are easy ideas to fall into by accident, and I don’t want
us to get needlessly muddled about them as we sort them out.

(Having picked Value as the winner for the first JEP, replacing Primitive
Objects with Primitive Values in the second JEP is not exactly graceful,
is it? Naming is hard. If you were to change the title I suggest simply
“Primitives” as the working title, until we figure out exactly what we
want these Primitives to be, relative to other concepts. Just a suggestion.)

On Nov 29, 2021, at 10:53 PM, John Rose
mailto:john.r.r...@oracle.com>> wrote:

Two points from me for the record:

1. I re-read the JEP draft now titled Value Objects, and liked everything I
saw, including the new/old term “Value” replacing “Pure” and “Inline”.

2. In your mail, and in the companion JEP draft titled Primitive Objects, you
refer to “primitive classes” and their objects. It would make our
deliberations simpler, IMO, if we were to title this less prescriptively as
“Primitives” or “Primitive Types” or “Primitive Types and Values”, rather than
“Primitive Classes”…

Re: JEP update: Value Objects

2021-11-29 Thread John Rose

Two points from me for the record:

1. I re-read the JEP draft now titled Value Objects, and liked everything I 
saw, including the new/old term “Value” replacing “Pure” and “Inline”.

2. In your mail, and in the companion JEP draft titled Primitive Objects, you 
refer to “primitive classes” and their objects.  It would make our 
deliberations simpler, IMO, if we were to title this less prescriptively as 
“Primitives” or “Primitive Types” or “Primitive Types and Values”, rather than 
“Primitive Classes”, because (a) there’s no logical need for the new things to 
be classes, and (b) it might actually be helpful for them *not* to be, in the 
end, after deliberation.  Putting the word “classes” in the title presupposes 
an answer to deliberations that have not yet been concluded.

People should note that the term “class” and “object” is only loosely bound to 
the term “primitive” in most of our designs, since (of course) today no 
primitives at all are either defined by classes or have objects.  They have 
corresponding reference or box classes and objects, to be precise.  Today a 
primitive type “has a class” but it is not the case that it “is a class”.  We 
could choose to preserve this state of affairs instead of fixing it by making 
“classes everywhere”; it makes some dependent choices easier to make.  As you 
know, one possible bridge to the future is, “Today all types are a disjoint 
union of primitives, classes, and interfaces, and tomorrow the same will be 
true, with all three possessing class-like declarations.”

What about objects, shouldn’t primitives at least be objects?  Well, interfaces 
don’t directly have objects today; they have objects of implementing classes.  
Likewise, primitives need never have objects directly, as long as they have 
objects which properly relate to them—their boxes.  Boxes-boxes-everywhere 
certainly has its downsides, include pedagogical downsides, but that doesn’t 
make it a non-starter.

Instead, if we choose to use the terms “primitive class” and “primitive object” 
as exact counterparts to “reference class” and “reference object”, as your 
chart suggests, Dan, we will have to account for the duplication and/or ad hoc 
division of various attributions of classes and objects between the “primitive  
class” and its corresponding “reference class” (e.g., int.ref, Point.ref).  I 
think a good leading question is, “if a primitive is a class, and its reference 
type is also a class, which of its methods are situated on the primitive class, 
and which are situated on the reference class?”  I would suggest that we be 
more sure we want to have two classes per primitive, or only-a-primitive-class 
per primitive, before we presuppose a decision by putting the word “Classes” in 
the title of JEP 402.

> On Nov 29, 2021, at 4:09 PM, Dan Smith  wrote:
> 
> I've been exploring possible terminology for "Bucket 2" classes, the ones 
> that lack identity but require reference type semantics.
> 
> Proposal: *value classes*, instances of which are *value objects*
> 
> The term "value" is meant to suggest an entity that doesn't rely on mutation, 
> uniqueness of instances, or other features that come with identity. A value 
> object with certain field values is the same (per ==), now and always, as 
> every "other" value object with those field values.
> 
> (A value object is *not* necessarily immutable all the way down, because its 
> fields can refer to identity objects. If programmers want clean immutable 
> semantics, they shouldn't write code (like 'equals') that depends on these 
> identity objects' mutable state. But I think the "value" term is still 
> reasonable.)
> 
> This feels like it may be an intuitive way to talk about identity without 
> resorting to something verbose and negative like "non-identity".
> 
> If you've been following along all this time, there's potential for 
> confusion: a "value class" has little to do with a "primitive value type", as 
> we've used the term in JEP 401. We're thinking the latter can just become 
> "primitive type", leading to the following two-axis interpretation of the 
> Valhalla features:
> 
> -
> Value class reference type (B2 & B3.ref)  | Identity class type (B1)
> -
> Value class primitive type (B3)   |
> -
> 
> Columns: value class vs. identity class. Rows: reference type vs. primitive 
> type. (Avoid "value type", which may not mean what you think it means.)
> 
> Fortunately, the renaming exercise is just a problem for those of us who have 
> been closely involved in the project. Everybody else will approach this grid 
> with fresh eyes.
> 
> (Another old term that I am still finding useful, perhaps in a slightly 
> different way:

Re: [External] : Re: EG meeting, 2021-11-17

2021-11-23 Thread John Rose

On Nov 22, 2021, at 5:13 PM, Dan Smith  wrote:
> 
>> On Nov 22, 2021, at 2:07 PM, Kevin Bourrillion  wrote:
>> 
>>> On Mon, Nov 22, 2021 at 6:27 AM Dan Heidinga  wrote:
>>> 
>>> I'll echo Brian's comment that I'd like to understand Kevin's use
>>> cases better to see if there's something we're missing in the design /
>>> a major use case that isn't being addressed that will cause useer
>>> confusion / pain.
>>> 
>> Sorry if I threw another wrench here!
>> 
>> What I'm raising is only the wish that users can reasonably default to 
>> B2-over-B1 unless their use case requires something on our list of "only B1 
>> does this". And that list can be however long it needs to be, just hopefully 
>> no longer. That's probably how we were looking at it already.
> 
> Here's the current list, FYI (derived from JEP 401):
> 
>   • Implicitly final class, cannot be extended.

JVMS requires ACC_FINAL on class.

>   • All instance fields are implicitly final, so must be assigned exactly 
> once by constructors or initializers, and cannot be assigned outside of a 
> constructor or initializer.

JVMS requires ACC_FINAL on every instance field.  (Static fields OK.)

>   • The class does not implement—directly or indirectly—IdentityObject. 
> This implies that the superclass is either Object or a stateless abstract 
> class.

JVMS requires a check for this.

>   • No constructor makes a super constructor call. Instance creation will 
> occur without executing any superclass initialization code.

JVMS rules for invokespecial  must exclude this.

>   • No instance methods are declared synchronized.

JVMS forbits ACC_SYNC. on all instance methods.  (Static methods OK.)

>   • (Possibly) The class does not implement Cloneable or declare a 
> clone()method.
>   • (Possibly) The class does not declare a finalize() method.

A conservative move is to forbid these things, in language and JVMS.
Minor precedent:  record has similar special cases (for component names).

>   • (Possibly) The constructor does not make use of this except to set 
> the fields in the constructor body, or perhaps after all fields are 
> definitely assigned.

JVMS doesn’t care about this.

The private opcodes initialvalue and withfield work to set up ’this’
as the constructor executes.  It’s OK to sample the value at any time,
but maybe the language says, “don’t do that”.

I think there are use cases for private methods to work on partially
initialized stuff.  The theory is tricky.  OK to be conservative now
and more lenient later.

> 
> And elaborating on IdentityObject & stateless abstract classes:
> 
> An abstract class can be declared to implement either IdentityObject or 
> ValueObject; or, if it declares a field, an instance initializer, a non-empty 
> constructor, or a synchronized method, it implicitly implements 
> IdentityObject (perhaps with a warning).

JVMS should enforce corresponding structural rules on loaded classfiles.
Neither a source class-or-interface nor a loaded classfile can ever
implement both IO and VO at the same time.

As a special feature in the JVM I want an explicit form for these
“empty constructors”.  We’ve discussed this; I’m not sure which form
is best, but I don’t want it to be a “not-really-empty” constructor which
has a super-call in it; that’s what seemingly “empty” constructor look
like today to the JVM.

The JVM should both allow and require an empty constructor if
and only if the abstract class implements VO.  (Alternative:
The JVM implicitly injects VO if it sees an empty constructor,
and if it sees VO it looks for an empty constructor.)

IIRC maybe our last consensus was to add an attribute to an
 method of signature ()V that says, “whatever you think
you see in this method, Mr. VM, please also feel free to skip it.”
That’s a more hacky way to specify an empty constructor than
would be my preference (which is an ACC_ABSTRACT ()V
or even a zero-length class attribute).  If a VO-only abstract
has an ()V method, that’s a smell, because it will never
be used!  OTOH, maybe just being a VO-0nly abstract class is
enough to tell the JVM that the constructor is empty, with
no further markings.  Anyway, there’s a little corner of the
design space to consider here.

> Otherwise, the abstract class extends neither interface and can be extended 
> by both kinds of concrete classes.

Such a class is very handy.  It needs *both kinds of constructors*.

Are you thinking that just mentioning the special VO super is
enough to trigger inclusion of an empty constructor?  That’s
probably a good move.  Is this the *only* way to request an
empty constructor, or is there a way to make an explicit
empty constructor?  (I mean a really-empty one, not just
today’s seemingly-empty ones.  Even Object’s empty constructor
has an areturn instruction, so it’s not really empty.)

> (Such a "both kinds" abstract class has its ACC_PRIM_SUPER—name to be 
> changed—flag set in the class file, along with an  method for

Re: EG meeting, 2021-11-17

2021-11-22 Thread John Rose

Thanks, Brian, for many useful suggestions about the diagram.

I have updated it in place.  Its message should be clearer now.

On Nov 21, 2021, at 9:05 PM, John Rose 
mailto:john.r.r...@oracle.com>> wrote:

http://cr.openjdk.java.net/~jrose/values/type-kinds-venn.pdf

Re: identityless objects and the type hierarchy

2021-11-21 Thread John Rose

On Nov 4, 2021, at 3:25 PM, Remi Forax 
mailto:fo...@univ-mlv.fr>> wrote:


I don't think a second bifurcation is needed.
At runtime bucket 2 and bucket 3 behave the same apart from null.
Given that IdentitylessObject (or whatever the name we choose) is an interface, 
it always accept null,
so if they are typed as that interface, B2 and B3 behave exactly the same.


Piling on:  The marker interfaces are useful for
testing and bounding *reference types*.  But
a primitive type is not a reference type, so it
cannot be (directly) tested or bounded as a
reference.

There *is* a difference between a reference
of the form B3.ref (B3.box, B3? whatever)
and B2.  But it’s not an interesting difference,
because when you box a B3 primitive you
get something which has (as Brian says)
all the affordances of reference, but
without object identity.  That’s exactly
what a B2 type is.  The only difference
between a reference to a B3 type and a
B2 type is the syntax by which they were
declared and derived.

This looked pretty clear to me when I
did my diagram, where B3 types have
ref projections that bubble into the
B2 swath of types.  Once there, they
behave exactly like native B2 types.

The diagram has three swathes for
concrete types (PRIM, NOID, IDOSAUR),
plus a separate upper quadrant for
non-concrete reference types.
The PRIM swath has a little excrescence
into the NOID swath where the P.ref
types pop out.

http://cr.openjdk.java.net/~jrose/values/type-kinds-venn.pdf

All that suggests to me that we won’t want
a marker interface to specially distinguish the
B3 excrescences.

It does also suggest that we are not done
bike-shedding terms:  What’s the collective
term for “B2 refs + B3 boxes”?  (I used NOID.)
Or, is a B3 box a “pure object” like any B2
pure object, whose class happens to be a
primitive class?  I dunno.

It remains true (and I hope will continue to
be true) that a B3 class defines two types,
one reference and one non-reference, while
a B2 class defines one reference type.
But maybe those two reference types are
both to “pure objects”?  I’ll bet Dan has
a take on this.

Re: [External] : Re: EG meeting, 2021-11-17

2021-11-21 Thread John Rose

On Nov 19, 2021, at 5:32 AM, Brian Goetz 
mailto:brian.go...@oracle.com>> wrote:

And this is not inconsistent with abstract superclasses contributing fields.

For me the poster child is Enum as much as Record.  I want pure
enums, some day, but in order to make this work we need a way for
the ordinal and name fields to (a) appear in the abstract class Enum
and (b) be suitably defined in the layout of each Enum subclass,
whether it is an identity subclass or a pure (B2) subclass.

Sketch of an example way forward (but still with the sense that we
have more important things to do):

 - Allow fields to be marked abstract, and mark Enum’s fields that way.
 - Do not require (or allow) constructors to initialize abstract fields.
 - The JVM can support virtualized getfield, maybe, or just ask the T.S. to use 
access methods.
 - As with methods, require a concrete subclass to redeclare inherited 
abstracts.
 - The concrete subclass will naturally declare and initialize the now-concrete 
field.
 - Have Enum support both kinds of constructors: Old School (fully concrete) 
and empty.
 - Figure out some story for concretifying Enum’s fields for Old School clients.

The trick would be to configure Enum so that it was a fully functional
super for both kinds of subclasses; it should behave one way to Old
School enum subclasses and another way to B2 enum subclasses.

It’s a research project.  I get the sense there’s a path forward, but
not a simple one.

If you exclude fields, then it’s not as hard as a research project IMO.
The abstract supers of a B2 are not themselves B2; they are polymorphic
types that (conventionally) live in the Old Bucket.

Re: EG meeting, 2021-11-17

2021-11-21 Thread John Rose

On Nov 18, 2021, at 2:58 PM, Remi Forax 
mailto:fo...@univ-mlv.fr>> wrote:

I suppose you are talking about empty (no field) abstract classes.
We need that for j.l.Object, j.l.Number or j.l.Record.

From a user POV, it's not very different from an interface with default methods.

Yes.  The key thing is that the abstract class in question
does not accidentally entangle itself with object identity.
There are three ways off the top of my head to do that:

 - have a constructor that needs to write fields through `this`
 - have a mutable instance field
 - have synchronization somewhere (a synch. method)

We’ll need to have a way for an abstract class (for Record,
for example) to stand clear of the object identity thicket.

I think we could allow such an abstract class to have final
fields, with suitable restrictions.  But it would require
a complex translation strategy and/or tricky JVM support.
The problem is that the fields in the super would have to
be replicated into each concrete subclass in a physically
separate manner.  Also the fields would have to have their
initialization declared by the superclass but defined by
the concrete subclass.  Also field access might need to be
virtualized, if each concrete subclass has its own idea
about where the field “lives” in its bundle of fields.
It’s doable but messy.  I’d rather leave it for later; we
have so many more worthwhile things to do.

Re: EG meeting, 2021-11-17

2021-11-21 Thread John Rose

Yes.  One way I like to think about the Old Bucket is
that it is characterized by *concrete* representations
which have somehow opted into object identity.

Confusingly, the Old Bucket also contains interfaces
which are non-concrete and also Object, which might
as well be non-concrete.  (I’m not saying “abstract”
because that’s a keyword in the language, and you
can have semi-concrete classes which are abstract
but also commit to object identity and may even
have mutable fields or by-reference constructors,
like AbstractList.)

Those are the two interesting populations in the
Old Bucket:  Concrete classes that are entangled
with object identity (until they can be migrated,
or forever in many cases).  And, non-concrete
classes, which are necessarily polymorphic.

Those two kinds of types (in the Old Bucket)
interact with the New Buckets in distinct ways.

There’s a middle case which is causing problems
here:  A class can be concrete *and* polymorphic,
meaning that subclasses can add more stuff.
(The parent class could be declared abstract
or not; that’s not an important detail.)

A class that is concrete *and* polymorphic is
exactly one that plays the classic game of object
oriented subclasses, where data fields and methods
are refined in layers.

This classic game does not translate well into
the by-value world; it needs polymorphic pointers.
Just consult any C++ style guide to see what happens
if you unwarily try to mix by-value structs and
class inheritance:  You shouldn’t, according to the
guides.

Is there a way to make that work in Java, so that
identity-free classes can inherit from each other?
Probably, in some limited way.  The simplest move
is the one Brian and I are liking here, where a
completely non-concrete class (one with no fields
and no commitment to object identity) can be
refined by a subclass.  But it should be marked
abstract, so as not to have cases where you have
a variable of the super-type and you don’t know
whether it has the layout of the super (because
it was concrete, oops) or a subtype.

The division separating non-concrete types from
identity-object types in the Old Bucket may be
seen in this diagram, which I cobbled up this
weekend:

http://cr.openjdk.java.net/~jrose/values/type-kinds-venn.pdf

This comes from my attempts to make a more or
less comprehensive Venn-style diagram of the stuff
we are talking about.  I think it helps me better
visualize what we are trying to do; maybe it will
help others in some way.

I view this as my due diligence mapping the side of the
elephant I can make contact with.  Therefore I’m happy
to take corrections on it.

I’m also noodling on a whimsical Field Guide, which asks
you binary questions about a random Java type, and guides
you towards classifying it.  That helped me crystallize
the diagram, and may be useful in its own right,
or perhaps distilled into a flowchart.  Stay tuned.

— John


On Nov 18, 2021, at 2:34 PM, Brian Goetz 
mailto:brian.go...@oracle.com>> wrote:

I think it is reasonable to consider allowing bucket two classes to be 
abstract.  They could be extended by other classes which would either be 
abstract or final. The intermediate types are polymorphic but the terminal type 
is monomorphic.

A similar argument works for records.

Sent from my iPad

On Nov 18, 2021, at 5:27 PM, Kevin Bourrillion 
mailto:kev...@google.com>> wrote:


On Wed, Nov 17, 2021 at 7:05 PM Dan Heidinga 
mailto:heidi...@redhat.com>> wrote:

Let me turn the question around: What do we gain by allowing
subclassing of B2 classes?

I'm not claiming it's much. I'm just coming into this from a different 
direction.

In my experience most immutable (or stateless) classes have no real interest in 
exposing identity, but just get defaulted into it. Any dependency on the 
distinction between one instance and another that equals() it would be a 
probable bug.

When B2 exists I see myself advocating that a developer's first instinct should 
be to make new classes in B2 except when they need something from B1 like 
mutability (and perhaps subclassability belongs in this list too!). As far as I 
can tell, this makes sense whether there are even any performance benefits at 
all, and the performance benefits just make it a lot more motivating to do what 
is already probably technically best anyway.

Now, if subclassability legitimately belongs in that list of 
B1-forcing-factors, that'll be fine, I just hadn't fully thought it through and 
was implicitly treating it like an open question, which probably made my 
initial question in this subthread confusing.



--
Kevin Bourrillion | Java Librarian | Google, Inc. | 
kev...@google.com

Re: [External] : Re: Consolidating the user model

2021-11-03 Thread John Rose

On Nov 3, 2021, at 4:05 PM, Dan Smith 
mailto:daniel.sm...@oracle.com>> wrote:

(It is, I suppose, part of the model that objects of a given class all have a 
finite, matching layout when accessed by value, even if the details of that 
layout are kept abstract. Which is why value types are monomorphic and you need 
reference types for polymorphism.)

The fact that the VM often discards object headers at runtime is a pure 
optimization.

Let’s see what happens if we say that (a) bare values have headers and (b) 
Object::getClass allows the user to observe part of the header contents.

It follows then that the expression aPointVal.getClass() will show the contents 
of aPointVal’s header, even if it is a compile-time constant.

Point pv = new Point(42,42);  // “class Point” is the definition of Point
assert pv.getClass() == Point.class;  // ok, that’s certainly the class
assert pv.getClass() != Point.ref.class;  // and it’s not a ref, so good

That is all fine.  There’s a little hiccup when you “box” the point and get the 
same Class mirror even though the “header” is a very real-heap resident value 
now:

Point.ref pr = pv;  // same object… now it’s on the heap, though, with a real 
live heap header
assert pr.getClass() == Point.class;  // same class, but...
assert pr.getClass() != Point.ref.class;  // we suppress any distinction the 
heap header might provide

There’s a bigger hiccup when you compare all that with good old int:

int iv = 42;  // “class int” is NOT a thing, but “class Integer” is
assert iv.getClass() != int.class;  // because int is not a class
assert iv.getClass() == Integer.class;  // ah, there’s the class!
assert iv.getClass() == int.ref.class;  // this works differently from Point
assert ((Object)iv).getClass() == pr.getClass();  // this should be true also, 
right?

And to finish out the combinations:

int.ref ir = iv;  // same object… now it’s on the heap, though, with a real 
live heap header
assert ir.getClass() == Integer.class;  // same class
assert ir.getClass() == int.ref.class;  // and this time it’s a ref-class (only 
for classic primitives)
assert ir.getClass() != int.class;

All this has some odd irregularities when you compare what Point does and what 
int does.  And yet it’s probably the least-bad thing we can do.

A bad response would be to follow the bad precedent of ir.getClass() == 
Integer.class off the cliff, and have pv.getClass() and pr.getClass() return 
Point.ref.class.  That way, getClass() only returns a ref.  Get it, see, 
getClass() can only return reference types.  The rejoinder (which Brian made to 
me when I aired it) is devastating:  Point.class is the class, not 
Point.ref.class, and the method is named “get-class”.

Another approach would be to fiddle with the definitions of val.getClass(), so 
as to align iv.getClass() with pv.getClass() with their non-ref types.  But 
that still leaves pv.getClass() unaligned (in its non-ref-ness) with 
ir.getClass() (in its ref-ness).  We still expect Point.class as the answer 
from *both* pr.getClass() and pv.getClass().

Or we could try to make the problem go away by simply outlawing (statically) 
instances of expr.getClass() that expose inconvenient answers.  Such moves 
score high on the “Those Idiots” score card.  And they still doesn’t align the 
ref-ness of pr.getClass() vs. ir.getClass().

Maybe we only earn partial Idiot Points if we outlaw iv.getClass() but allow 
pv.getClass()?  Same amount of seam, different shape of seam, IMO.

Another source of constraint is that we expect that up-casting anything to 
Object and then re-querying should not change the answer.  (This is another way 
of saying that the header should stay the same whether it is in the heap or 
not.)  It is one of the reasons that iv.getClass() should not return int.class.

assert ((Object)pv).getClass() == pv.getClass();  // this should be true also, 
right?
assert ((Object)pr).getClass() == pr.getClass();  // this should be true also, 
right?
assert ((Object)iv).getClass() == iv.getClass();  // this should be true also, 
right?
assert ((Object)ir).getClass() == ir.getClass();  // this should be true also, 
right?

This is an over-constrained problem.  I don’t know how to make it look more 
regular, and I think (after doing some more exhaustive analysis off-line) there 
aren’t any other ideas we haven’t examined.

(I’m saying that partly in a superstitious hope that, having said it, someone 
will of course prove me wrong.)


I'm claiming this picture makes explaining the feature harder, unnecessarily. 
An unhoused value floating around somewhere that I can somehow have a reference 
to strikes me as quite exotic. Tell me it's just an object and I feel calmer.

Yes, it's just an object. :-)

But not quite how you mean. The new feature here is working with objects 
*directly*, without references. I think one thing you're struggling with is 
that your concept of "object" includes the reference, and if we take that away, 
it doesn't quite seem

Re: [External] : Re: Consolidating the user model

2021-11-03 Thread John Rose

On Nov 3, 2021, at 11:34 AM, Brian Goetz  wrote:
> 
> There's lots of great stuff on subtyping in chapters 15 and 16 of TAPL (esp 
> 15.6, "Coercion semantics"), which might be helpful.  But as a tl;dr, I would 
> suggest treating subtyping strictly as an is-a relation within our nominal 
> type system.  By this interpretation, int  are both _conversions_.  

Yes, that’s good.  So when someone tries to say “int <: long” or “int <: 
Object” our response would be “sorry, you are talking about a different idea of 
types”.  Something like “int <: Object” is a conversion the object can do, not 
two ways of viewing the whole object.  For us, types are about is-a, not 
is-a-member-of-larger-set (a disguised has-a) or can-do-a-conversion (another 
disguised has-a).

That does lead us to the next hard problem:  Which is that a value is not a 
box, it has a box.  And yet we want reflection (getClass specifically) not to 
make a distinction between those two distinct entities, but assign them the 
same class.  Which is fine, except that Classes have grabbed some of the jobs 
of types.

Brainstorming here:  We might be happier with a method called getRuntimeType 
which is allowed to return different values when applied to a box/ref of Point 
and a value of Point.  And then we notice that Point values don’t have 
inheritance or super types, directly, so the method paradigm (getRuntimeType 
being a method) is overkill; these are all statically bound methods.

And yet, there is a strong constraint that such a method, in its statically 
bound form, should return the same value as the corresponding call when applied 
to a box (under the type Object, maybe).  I don’t know how to untie this knot 
completely.

Brainstorming again:  getRuntimeType applied to a value can (and should?) be 
constant folded at compile time.  (Same point for getClass in fact.)  When 
applied to a ref it cannot (usually).  This makes getRuntimeType feel even less 
like an object method, but more like a __RuntimeTypeOf[ … ] syntax (no 
bikesheds were painted in the production of this statement).

I’m thinking that those few users who want to extract type mirrors from 
(non-null) witnesses will need to specify manually which type projection they 
are expecting, rather than hope that the the result they want will pop out.

Not this:
  Class cp = point.getClass(); //Point
  Class ci = anint.getClass(); //Integer (aka int.ref)

but this:
  Class cp = point.getClass().valueType(); //Point
  Class ci = anint.getClass().valueType(); //int

or else this:
  Class cp = point.getClass().referenceType(); //Point.ref
  Class ci = anint.getClass().referenceType(); //Integer

In other words, if the rift between Integer and Point is not completely healed, 
users can probably work around the problems.

Re: [External] : Re: Consolidating the user model

2021-11-03 Thread John Rose

On Nov 3, 2021, at 10:23 AM, Kevin Bourrillion 
mailto:kev...@google.com>> wrote:

I think this fits neatly with the current design: `Point` has no supertypes*, 
not even `Object`, but `Point.ref` does.

(*I mean "supertype" in the polymorphic sense, not the "has a conversion" sense 
or the "can inherit" sense. I don't know what the word is really supposed to 
mean. :-))

Slippery terms.  “Type” is hopelessly broad as is “super type”.

For types as value sets, a super type is a value super set.
Again, int <: long in this view, and even in the JLS.

For types as in an object hierarchy, a super type is a parent+
type, an upper limit in the hierarchy lattice.  That view
centers on object polymorphism and virtual methods,
and is suspiciously bound up with pointer polymorphism.
So String <: Object in this view.

To heal the rift we are groping towards int <: Object, but
we don’t fully know which kind of “<:” that is, and how
it breaks down into a value set super, an object hierarchy
super, or perhaps something further.  The best view we
have so far, IMO, is that int <: Object breaks apart into
int <: int.ref (value set) and int.ref <: Object (hierarchy).
In that view, the last link of int <: int.ref requires a
story of how methods “inherit” across value sets,
without the benefit of a pointer-polymorphic hierarchy
to inherit within.  It’s doable, but we are running
into the sub-problems of this task.

Consequences of null for flattenable representations

2021-11-03 Thread John Rose

As we just discussed in the EG, allowing null to co-exist
with flattenable representations is a challenge.  It is
one we have in the past tried to avoid, but the very
legitimate needs for (what we now call) reference
semantics for all of Bucket 2 and some of Bucket 3
require us to give null a place at the table, even while
continuing to aim at flattening nullable values,
when possible.

A good example of this is Optional, migrated from a
Bucket 1 *value-based class* to a proper Bucket 2
*reference-based primitive*.   (See that tricky change
in POV?)  Another example to keep in mind is the
reference projection of a Bucket 3 type such as
Complex.ref or Point.ref.

The simplest way to support null is just to do what
we do today, and buffer on the heap, with the option
of a null reference instead of a reference to a boxed value.

(We call such things “buffers” rather than “boxes” simply
because, unlike int/Integer, the type of thing that’s in
the box might not be denotably different from the type
of the “box” itself.)

The next thing to do is inject a *pivot field* into the flattened
layout of the primitive object.  When this invisible field
contains all zero bits, the flattened object encodes a null.
All the other bits are either ignorable or must be zero,
depending on what you are trying to do.

This idea splits into two directions:  How to work with
“pivoted” non-null values, and how to represent the pivot
efficiently. Both lines of thought are more or less required
exercises, once you allow null its place at the table.

We know where null comes from (the null literal and
aconst_null).   Where do pivoted values come from?
You need an original source of them for the initial
value of “this” in the primitive constructor (a factory
method at the bytecode level).  Specifically, you need
that bit pattern which is almost but not quite all
zero bits; the pivot field is set to the “non-null”
state but all other field values are zero.  Then
the constructor can get to work.

This might be the job of an “initialvalue” bytecode,
which is a repackaging of the “defaultvalue” bytecode.
Given a suitable definition with suitable restrictions
for initialvalue, a constructor uses a mix of initialvalue
and withfield executions to get to its output state for “this”.
None of the intermediate states would be confusable
with null.

(We sometimes assumed, wrongly in hindsight, that
doing this simply requires assigning “this” to
null in the constructor and then special-casing
withfield and maybe getfield to allow a null input
and maybe a null output.  But this is a thicket of
tangles and irregularities, and it doesn’t quite
get rid of the need for a separate operation to
actually set the pivot field.  Basically, once null
gets entrenched, defaultvalue has to turn into
initialvalue, or so it appears to me at this moment.)

Once the constructor returns a non-null set of
bits, all subsequent assignments continue to
separate null from non-null.  That’s true even
for racy assignments, assuming that pivot field
states are individually atomic, even if they race
relative to other fields.

(Race control might be important for Bucket 3
references like Complex.ref, if we ever try to
flatten those.  I’m digressing; my focus is to
build out Bucket 2, which suppresses such races.)

To allow Bucket 2 constructors control over their
outputs, it follows that initialvalue (unlike its
earlier version defaultvalue) must be restricted
to those same contexts where withfield is allowed.
Either to constructors only (for the same class)
or to the capsule (nest) of that class.

OK, so how is the pivot field physically represented?
Again, we have discussed this in years past, but I’ll
summarize some of the thinking:

1. It can be just a boolean, a byte or a packed bit
that is made free somehow.  A 65th bit to a 64-bit
payload perhaps.  This is sad, but also hard to get
around when every single bitwise encoding in the
existing layout already has a meaning.

But the payload of the primitive type might use a
field with “slack”, aka unused bitwise encodings.
We can pounce on this and use bit-twiddling
to internally reserve the zero state, and declare
that when that field is zero, it is the pivot field
denoting null, and when it is non-zero it is
doing its normal job.

2. If the language tells us, “yes I promise not
to use the default value on this field” then maybe
the JVM can do something with that promise.
There are issues, but it’s tempting for (say)
a Rational type where the denominator is
never zero.

3. More reliably, if the JVM knows that the
a field has unused encodings, it can just swap
the all-zero state with some other state.
People will immediate think of unused bits
which can be flipped to true in the field
when it is pivoted to non-null.

It’s better, IMO, to start out with the humble
increment operator (rather than the bit-set
operator) and work from there.  As long as
the encoding of all-one-bits is not taken,
for a given field (true for booleans

Re: [External] : Equality operator for identityless classes

2021-11-03 Thread John Rose

One of the long standing fixtures in the ecosystem is the
set of idioms for correct use of op==/acmp.  Another is lots
of articles and IDE checkers which detect other uses which
are dubious.  It’s a problem that you cannot use op==/acmp
by itself in most cases; you have to accompany it by a call
to Object::equals.  We might try to fix this problem, but
it cannot be expunged from our billions of lines of
pre-existing Java code.

I like to call these equals-accompanying idioms L.I.F.E,
or Legacy Idiom(s) For Equality.  It shows up, canonically,
in this method of ju.Objects:

public static boolean equals(Object a, Object b) {
return (a == b) || (a != null && a.equals(b));
}

Thus, the defective character of op==/acmp is just
(wait for it) a fact of L.I.F.E. and we cannot fight it too
much without hurting ourselves.

Turning that around, if L.I.F.E. is a dynamically common
occurrence (as it is surely statically common) then we
can expend JIT complexity budget to deal with it, and
(maybe even) adjust JVM rules around the optimizations
to make more edgy versions of the optimizations legal.

Specifically, this JIT-time transform has the potential to
radically reduce the frequency of op==/acmp:

   (a == b) || (a != null && a.equals(b))
=>
  (a == null ? b == null : a.equals(b))

This only works if all possible methods selected from
a.equals permit the dropping of op==.  The contract
of Object::equals does indeed allow this, but it is not
enforced; the JVMS allows the contract to be broken,
and the transform will expose the breakage.  And yet,
there are things we can do here to unlock this transform.

More generally, for other L.I.F.E.-forms, I am confident
we can build JIT transforms that reduce reliance on
acmp, which is suddenly more expensive than its coders
(and the original designers of Java) expect.

Programmers who override Object::equals to (as you
nicely say) disavow identity-based substitutability
will probably write, prompted by their IDE, in a
ceremonial mood, that one occurrence of op==/acmp
to short-circuit the rest of their Foo::equals method.
Or they may erase it, in a purifying mood.

In either case, the above transform requires the JIT
to examine such as either actually or potentially
starting with a short-circuiting op==/acmp.
In any case, such an identity comparison will be
monomorphic in the receiver type, not a
polymorphic multi-way dispatch on Object
references.

So this is not just moving around costs that stay the
same; you can de-virtualize op==/acmp by moving
it into the prologue of all Object::equal methods.
(Non-compliant ones can be handled by splitting
the entry point.)  Once the actual or potential
op==/acmp is found at the start of Foo::equals, we
can then inline and reorder the checks in the body
of the equals method.  At that point the cost of op==
starts to go to zero.

This is old news; we’ve discussed it in Burlington
now these many years ago.  But I thought I’d remind
us of it.  And this is really a more hopeful approach
to L.I.F.E.  That is, even if we don’t do these JIT
transforms in the first release, there is a path forward
that eventually removes the unintentional costs of
op==/acmp when L.I.F.E. throws them at us.

All this can work without requiring a global move to a
completely new operator (op===), surely an alien form
of L.I.F.E. within our ecosystem.

(Ba-DUM-ch!)

Re: [External] : Re: Consolidating the user model

2021-11-03 Thread John Rose


On Nov 2, 2021, at 4:53 PM, Brian Goetz 
mailto:brian.go...@oracle.com>> wrote:


Actually, that makes me start to wonder if `getClass()` should be another 
method like `notify` that simply doesn't make sense to call on value types. 
(But we still need the two distinct Class instances per class anyway.)

You could argue that it doesn't make sense on the values, but surely it makes 
sense on their boxes.  But its a thin argument, since classes extend Object, 
and we want to treat values as objects (without appealing to boxing) for 
purposes of invoking methods, accessing fields, etc.  So getClass() shouldn't 
be different.

One way to thicken this thin argument is to say that Point is not really a 
class.  It’s a primitive.  Then it still has a value-set inclusion relation to 
Object, but it’s not a sub-class of Object.  It is a value-set subtype.

It’s probably fruitless, but worth brainstorming as a heuristic for possible 
moves, so… we could say that:

- Point is not a class, it’s a primitive with a value set
- Point is not a subclass of Object, it’s a subtype (with value set conversion, 
like int <: long)
- !(Point *is a* Object) & (Point *has a* Object box)
- Point does not (cannot) inherit methods from Object
- Point can *execute* methods from Object, but only after value-set mapping

Re: Consolidating the user model

2021-11-02 Thread John Rose

On Nov 2, 2021, at 3:44 PM, Kevin Bourrillion 
mailto:kev...@google.com>> wrote:


 new X().getClass() == X.class

Seems like part of the goal would be making it fit naturally with the current 
int/Integer relationship (of course, `42.getClass()` is uncommitted to any 
precedent).

It seems like `Complex.class` (as opposed to `Complex.ref.class`) would never 
be returned by `Object.getClass()` in any other condition than when you could 
have just written `Complex.class` anyway.

Actually, that makes me start to wonder if `getClass()` should be another 
method like `notify` that simply doesn't make sense to call on value types. 
(But we still need the two distinct Class instances per class anyway.)


Yep, you hit on a tricky spot there.  One part of the problem
is that getClass, specifically and uniquely, has a special relation
the the primitive types which is coupled to the typing of
class literals like int.class (which is Class not Class).
Also, Integer is a class, and Complex is a class, but they have
different “tilts”:  Integer is (kinda sorta) int.ref but Complex
is not Complex.ref, and the mirrors reflect this difference.

Sorting this out seems to be an overconstrained problem.

As you say, we have not yet applied “.getClass” to any
non-ref type, yet, but we will certainly do so, and that’s
when the fun begins.

Also, trying to retype int.class as Class is a related
part of the fun.

In the end, however nicely we “heal the rift” between
good old int and his new friend Complex, there will
surely be some scars on good old int from his time
marooned (with just a few friends) in primitive-land.

(My current mental metaphor for the isolation of int
is Gilligan, who had about the same number of
unfortunate island-mates as int does.)

Re: Consolidating the user model

2021-11-02 Thread John Rose

On Nov 2, 2021, at 3:44 PM, Kevin Bourrillion 
mailto:kev...@google.com>> wrote:

Btw, am I right that for the middle bucket, `==` will fail (at compile-time 
when possible)?


I don’t see how middle bucket references, which behave very
much like old-bucket references (id-classes), would tend to
fail on ==/acmp any more than old-bucket references.

Example please?

If X is an old-bucket or middle-bucket type, then all of
these are OK and lead to expected results:

X x, x1;
x == x
x == x1
x == null

If Y is a class which is statically disjoint from X, then
these may fail, but not through any bucket-related
effect:

Y y;
x == y  //error: incomparable types: X and Y

I think I’m missing your point…

Re: Consolidating the user model

2021-11-02 Thread John Rose

+100; great summary

> On Nov 2, 2021, at 2:18 PM, Brian Goetz  wrote:
> 
> which means that the `L*` types might work out here.   Stay tuned for more 
> details.  

A footnote, FTR, about L*-descriptors, in case it doesn’t ring a bell.

Brian is referring here to the thing we have talked about several
years ago, of loosely coupling a side-record with an occurrence
of L-Foo that means “link like L-Foo, but load and adapt like Q-Foo”.
We went through some of these iterations even before we settled
on Q-descriptors; they are back again, but in a far more tractable
form we think.

L* is not a new descriptor, it’s just an L (so it links to plain L’s)
but some sort of star-like marking * (not really in the descriptor
string but a side channel!) alerts the JVM to do extra loading
and adapting.

So, one current vision of this side-channel is a very limited early
use of the “Type Restriction” mechanism, as mentioned in the
Parametric VM proposal and elsewhere.  The idea is that a type
L*-Foo would be TR-ed to itself (Foo.class) and since TR’s use
eager loading (of the content of the TR, not of the type it
applies to) the effect would be similar to a Q-Foo, but it
would still be spelled L-Foo.  To avoid implementation
burdens, the JVM would not accept any more “interesting”
TRs, until we need to build them out for specialized generics.
Or we’d just have a one-shot, purpose-built side channel
which smells like an infant sibling to an eventual real T.R.
feature.  A T.R. that really restricts a type (instead of
just asks the JVM to take a closer look a la Q-Foo) is a
much deeper implementation challenge, since it creates
possible failure points when restrictions are violated.
An L* cannot violate itself since the value set is the same.
This is why L* only works on the middle bucket.

L*-Foo (using TRs or any other side-channel) is not a perfect
substitute for Q-Foo, because the stars “rub off too easily”
to ensure rigid correspondence between callers and callee.
This means L*-based API linkage requires more speculation
and runtime checking, compared to Q-based API linkage.

Although it may seem odd, there are a number of practical
reasons to use L* in the middle bucket but Q in the left
bucket.  The left bucket needs two descriptors, so L/Q.
The middle bucket has just one class mirror, so either Q
or else a mix of L and L*, and it needs some story for
migration for a few of its citizens, so L* looks good
again (linking with legacy L with a dynamic mixup).

As Brian says, we may elect to use Q uniformly for the
middle bucket, and handle the migration problem
another way.  It would be good if we could decide
Q vs. L* for the middle bucket without co-solving
the migration problem.

Anyway, such smaller details are up in the air.  The
points in Brian’s message are the high-order bits, and
the stuff I’ve shared here is a footnote.  Please do give
the high-order bits your best attention.  It’s a really
good write-up.

— John

Re: Revisiting default values

2021-07-01 Thread John Rose

> On Jul 1, 2021, at 5:48 AM, Brian Goetz  wrote:
> 
> 
>> 
>> Which reminds me:  I think we should
>> allow calls to methods on `this` inside a
>> constructor.  (Do we?)  A clean way to
>> statically exclude incomplete values of `this`
>> would be to outlaw such self-calls until all
>> final fields are definitely assigned.  The
>> current language (for identity classes)
>> computes this point (of complete field
>> assignment) in order to enforce the rule
>> that the constructor cannot return until
>> all final fields have been definitely assigned.
> 
> FYI: A recent paper on the self-use-from-constructor problem: 
> https://dl.acm.org/doi/10.1145/3428243
> 
> 

Nice; it supports virtual calls in a constructor.
To me that seems a good stretch goal.
A simpler rule would define an “all initialized”
point in the constructor (no DU states, basically)
and open the floodgates there.  A more complicated
set of rules could allow earlier access to partially
DU objects, as a compatible extension.  In terms
of the paper, the initial conservative approach
does not allow (or perhaps warns on) any typestate
that has a remaining DU in it, while an extended
approach would classify accesses according
to which DU’s they might be compatible with.

An example of the difference would be:

primitive class Complex {
  float re, im, abs, arg;
  Complex(float re, float im) {
 this.re = re;
 this.im = im;
 if (CONSERVATIVE_AND_SIMPLE) {
   // we can easily do this today
   this.abs = Complex.computeAbs(re, im);
   this.arg = Complex.computeArg(re, im);
 } else {
   // later, enhanced analysis can allow this.m()
   this.abs = this.computeAbs();
   this.arg = this.computeArg();
 }
  }
}

Other observations:

The paper seems to formalize and extend the
DA/DU rules of Java 1.1 (which I am fond of),
under the term “local reasoning about initialization”.

The distinction between objects that are “hot” (old)
and “warm” (under construction) objects seems to
align with some of our discussions about confinement
of “larval” objects before they promote to “adult”.

Re: Revisiting default values

2021-06-30 Thread John Rose


On Jun 29, 2021, at 2:36 PM, Kevin Bourrillion 
mailto:kev...@google.com>> wrote:

Speaking of orthogonality, there *is* an open question about how we interpret 
, and this is orthogonal to the question of whether  should 
be the "default default". We've talked about:
- It's interchangeable with null
- It's null-like (i.e., detected on member access), but distinct
- It's a separate concept, and it is an error to ever read it from fields/arrays

All still on the table.

Oh. Yeah, if you look at all the work we've all poured into how we manage null 
and its attendant risks, and ongoing work (perhaps charitably assume JSpecify 
will be successful! :-)), then it's knd of a disaster if there's suddenly a 
second kind of nullness. #nonewnulls



BTW, the combination of #nonewnulls (a principle
I whole-heartedly favor) and “it is an error to ever
read it” pencils out some capability to define
containers of nullable types but which reject nulls
in some way.  (Perhaps a subtle way:  Perhaps the
container starts out null but cannot be read
until a different initial value is written.)

Such containers would resemble, in certain
ways, containers for lazily-computed values.
(I.e., they have a dynamically-distinguished
undefined state, and they cannot return to
that state having left it.)  OTOH, a container
for a lazily-computed *nullable* value would,
in fact, require a second sentinel, not null,
to denote the unbound state; making that
sentinel escape would create a #newnull,
which would be bad.  Not sure how to square
this circle yet.

Another random side-point, this time about
`withfield`:  It is IMO impractical to perform
default exclusion (null detection) for field
assignments in primitive class constructors,
because the default-ness of `this` is a complicated
dynamic property of all the fields together.
As a constructor executes, it may temporarily
revert `this` to the default value if it zeroes
out a field.  So the `withfield` instruction
should *not* perform default exclusion
on `this`.  (Of course an excluded default
would get checked for on exit from the
constructor.)

A similar point goes for `getfield`, perhaps,
though less strongly, because the Java
language would not use it.  But the JVMS
should probably attach default exclusion
either to both `withfield` and `getfield`
or neither.  This suggests that only method
invocations would perform default exclusion.

Which reminds me:  I think we should
allow calls to methods on `this` inside a
constructor.  (Do we?)  A clean way to
statically exclude incomplete values of `this`
would be to outlaw such self-calls until all
final fields are definitely assigned.  The
current language (for identity classes)
computes this point (of complete field
assignment) in order to enforce the rule
that the constructor cannot return until
all final fields have been definitely assigned.

For identity classes it would be nice to
*warn* on self-calls (or perhaps other
uses of `this`) before all finals are DA.
For primitives classes we can outlaw
such things, of course.

Basically, for both primitive and
identity class constructors, it is a likely
bug if you use `this` (for something
other than initializing `this`) before
all final fields are DA.  And yet some
constructors perform additional
operations (such as sanity checks)
in an object constructor, when the
object is in DA and “almost finished”
state.  It would be smart, I think, to
make the rules in this area, as coherent
as possible, for both kinds of classes.

IIRC the status quo is that no uses of
`this` are legitimate in a constructor
body except field writes (of DU fields)
and field reads (of DA fields).  I think
that is too strict, compared to the
laxness of the rules for identity classes,
and given the usefulness of error
checking methods being called from
constructors.

Re: [External] : Re: Revisiting default values

2021-06-30 Thread John Rose

P.S. Wikipedia gives for boojum:  “A fictional
animal species in Lewis Carroll's nonsense
poem The Hunting of the Snark; a particularly
dangerous kind of snark.”

“But if ever I meet with a Boojum, that day,
   In a moment (of this I am sure),
I shall softly and suddenly vanish away—
   And the notion I cannot endure!”

On Jun 30, 2021, at 5:16 PM, John Rose 
mailto:john.r.r...@oracle.com>> wrote:

Maybe, in some of these schemes, null
is not a primitive, but boojums and boxes
are the primitives, and null is a (safely)
boxed boojum?

Re: [External] : Re: Revisiting default values

2021-06-30 Thread John Rose

> On Jun 30, 2021, at 8:39 AM, Brian Goetz  wrote:
> 
> Now, let's talk more about null.  Null is a *reference* that refers to no 
> object.  For a flattened/direct object (P.val), null cannot be in the value 
> set (it's the wrong "kind"), though we can arrange for a primitive to behave 
> like null in various ways.  It's not clear whether this helps or hurts the 
> mental model, since it is distorting what null is.

This is a good point, if we can hold onto it.

Null is a magic one-off boojum that lives in
the space of reference types but makes
field references and method calls “softly
and suddenly vanish away”.

Having P.val.default.m() throw an NPE (under
default exclusion rules TBD) makes the null
 boojum arise from a non-reference value,
but only just long enough to make the method
call go away.  (Boo—)

Dan’s proposals for default exclusion is
loads from uninitialized variables (such as
fresh array elements) amount to another
boojum-like behavior, of making loads
go away (unless the variable has been
stored into previously).  Again, it’s not
directly associated with a reference,
but it is null-like, and perhaps NPE
is the right way to signal the fault.

Of course, our familiar null does not show
complete boojum behavior, because you can
read, write, and print null without yourself
vanishing away.  Likewise, even if we do
some sort of default exclusion, perhaps we
will allow defaults to flow in the same
(limited) paths that nulls can flow.  And
in that case, the #nonewnulls crowd would
expect that only the one value null would
appear, whenever such a value were
converted to a reference.

Maybe, in some of these schemes, null
is not a primitive, but boojums and boxes
are the primitives, and null is a (safely)
boxed boojum?

— John

Re: [External] : Re: Making Object abstract

2021-06-23 Thread John Rose

On Jun 17, 2021, at 4:40 AM, Remi Forax 
mailto:fo...@univ-mlv.fr>> wrote:

As a stretch move, I think we can even retro-upgrade
the type checking of Objects::newIdentity with type
variable trickery, when IdentityObject becomes real.

Please see:

https://bugs.openjdk.java.net/secure/attachment/95170/Foo.java
https://bugs.openjdk.java.net/browse/JDK-8268919


I wonder if a simple way to avoid to allow any Ts if to allow to specify an 
intersection type as parameter type and/or return type of methods (we have 
'var' inside the body)
so instead of

 public static  T newIdentity() {

we can write

 public static Object & IdentityObject newIdentity() {

This requires a grammar change but it's exactly the type we want.

Yes.  I was trying to avoid that.  My attempt (see above)
is wrong because (as Dan points out) it can infer T to
be String, which is false.  The type variable T would
need lower bound of Identity, as well as an upper
bound (and erasure) of Object.  But, as with return
type intersections, there is no syntax to express
that, apparently.  Only wildcards can have
lower bounds, right?  And you can’t have a
(lower-bounded) wildcard for a return type:

public static List newIdentityList(); //OK
public static ? super IdentityObject newIdentity(); //not OK

Re: Making Object abstract

2021-06-16 Thread John Rose

On Jun 2, 2021, at 7:57 AM, Brian Goetz  wrote:
> 
> A minor bikeshed comment: We're asking users to change their `new Object()` 
> to `IdentityObject.newIdentity()`, and they may ask "why do I have to say 
> 'Identity' twice"?  (And by ask, I mean grumble, because we're already asking 
> them to change their working code.)
> 
> After a few minutes of thought, I think it might be a better fit to put this 
> at Objects::newIdentity.  The methods in Objects are conveniences that users 
> could write themselves, which this basically is -- there's nothing special 
> about this method, other than having a preferred alternative to `new 
> Object()` which users will understand.  So parking this where the Object 
> conveniences go seems slightly lower friction.

I think this is OK.

As a stretch move, I think we can even retro-upgrade
the type checking of Objects::newIdentity with type
variable trickery, when IdentityObject becomes real.

Please see:

https://bugs.openjdk.java.net/secure/attachment/95170/Foo.java
https://bugs.openjdk.java.net/browse/JDK-8268919

— John

Re: JEP draft: Universal Generics

2021-06-09 Thread John Rose

On Jun 8, 2021, at 2:21 PM, Dan Smith  wrote:
> 
> Please see this JEP draft:
> 
> http://openjdk.java.net/jeps/8261529
> 
> This is the third anticipated piece in our initial suite of Valhalla preview 
> features (along with JEPs 401 and 402). It's also the first step in the 
> revised generics story, to be followed up in the future with JVM enhancements 
> for better performance (including species & type restrictions).
> 
> This is entirely a language enhancement, and will be experienced by 
> developers as a number of new warnings for generic classes and methods. 
> Addressing the warnings makes generic APIs interoperate smoothly with 
> primitive value types and prepares them for the future JVM enhancements.

I like this JEP.  I think it proposes reasonable
tactics for repositioning type variables for
success with Valhalla.

From the way it reads (notably, where it
says syntax is subject to change), it seems
a provisional design, to be validated by
actual experience using the language
features to create new library APIs and
adapt existing ones to help deal with
null pollution gracefully.  To put it
negatively, I don’t fully trust the
design here, until we have a chance
to use it for some time with real APIs.
But I think it’s very reasonable first
cut.

One item that is a wrong note for me
is the place where you say, “The proof
is similar to the control-flow analysis
that determines whether a variable has
been initialized before use.” I know
something about those proofs, having
contributed the “definite unassignment”
rules in Java 1.1.  The basic rules do not
produce different answers along diverging
control paths (such as the “then” and
“else” sides of an if).  The rules for
flow-based assignment in pattern
matching do something like this,
with the “assigned when true” type
clauses, but they are tied to new
testing sytnax (instanceof patterns).
But null testing can be done in many
ways, and so there is no “bright line”
for determining if a variable is null
or not on a taken path.  It’s a slippery
slope.  If you look at null-checking
frameworks, or JITs, you’ll see dozens
of rules regarding null deduction.
Also, none of the existing DA/DU
rules *change* the type of a variable;
they just make it available or not
available, but you are promising a
rule which makes the *same* variable
nullable along some paths and not
nullable along others.  That’s not
a small or incremental change on
the existing language machinery.

All this is to say, I don’t think you
can display a clean proof, based on
a clean language design, that will
get what you are claiming.

I have a better proposal instead:
Just make sure it is possible to build
user-defined API points which have
the appropriate null-isolation effects.

For example, this should type-check,
and should be a usable idiom for
statically checked null control:

foo(T.ref xOrNull) {
   T x = Objects.requireNonNull(xOrNull);
}

The internals of Objects.requireNonNull
probably contain an unchecked cast from
T.ref to T, where the language cannot “see”
the invariant, but the programmer can.
A @SuppressWarnings completes the story.

A similar point might work for the API of
Class.cast, if we can figure out how to
find the right Class witnesses:

foo(Class witness, T.ref xOrNull) {
   T x = witness.cast(xOrNull);
}

foo(Class witness, T.ref xOrNull) {
   T x = witness.valueType().cast(xOrNull);
}

(The fact that valueType returns NONE instead
of self is a problem here.)

The rules for instanceof can also be adjusted
to narrow from a target of T.ref to a variable
binding of T.  This is (IMO) a better use of
language complexity than an open-ended
hunting season for nulls.

Anyway, I think the above ideas are less
of a blind alley than promising magic
flow checking of nulls.

— John

Re: [External] : Re: consolidated VM notes for primitive classes

2021-04-27 Thread John Rose

On Apr 27, 2021, at 7:27 AM, Peter Levart 
mailto:peter.lev...@gmail.com>> wrote:

If this did see an implementation in the VM, we would essentially get 
muti-referent Ephemeron(s) out of it. Not very easy to implement though.


Sorry, this is not correct. The rules for Ephemeron(s) are different. If 
primitive object became unreachable when all of its component identity object 
references became unreachable, then we would get Ephemeron. The rules in the 
document (at least one) are easier to implement.

Yep.  Brian and Jim Laskey have a POC that works.
Also (as the GC folks observe) this is similar to the
rule we use in HotSpot for liveness of “n-methods”
(compiled code blobs).  As soon as one weak ref in
a code blob goes dead, the whole thing can be
recycled.

The difference here is between “and” and “or”
connectives.  In one case, a weak ref is queued
when *any* of its sub-references is queueable,
while in the other (ephemeron) case, we wait
until *all* are queueable.

— John

Re: [External] : Re: consolidated VM notes for primitive classes

2021-04-27 Thread John Rose

On Apr 27, 2021, at 7:37 AM, Peter Levart 
mailto:peter.lev...@gmail.com>> wrote:


And neither is that. I had to look back at the specification. Ephemeron refers 
to a pair of referents, but they are not equivalent. The reachability of the 
1st referent governs the reachability of the 2nd. Sorry for these inappropriate 
comments.


I’m glad you l0oked it up!  Perhaps there is
a logical implication connective in that third
definition.

Re: [External] : Re: Parametric VM class file format

2021-04-21 Thread John Rose

I pushed an updated version of the PVM document
to its new home in valhalla-docs:

https://github.com/openjdk/valhalla-docs/blob/main/site/design-notes/parametric-vm/parametric-vm.md

Here is the produced HTML (we don’t have this
automated yet):

http://cr.openjdk.java.net/~jrose/values/parametric-vm.html

It attempts to address recent EG comments.  In order
to track our progress I committed the first public version
to github, and then immediately overwrote it with the
current version.

Here are the most recent changes:

https://github.com/openjdk/valhalla-docs/commit/e6d16bfed0768ade802e58808b79c4c44b5ec1cb

At you prompting, Remi, I have changed “parameter” to
“selector” as the term which describes the cookie that
a client uses to request a specialization.

— John

Re: [External] : Re: Parametric VM class file format

2021-04-21 Thread John Rose

On Apr 21, 2021, at 8:32 AM, fo...@univ-mlv.fr wrote:
> 
> - Mail original -
>> De: "John Rose" 
>> À: "Remi Forax" 
>> Cc: "valhalla-spec-experts" 
>> Envoyé: Mercredi 21 Avril 2021 08:43:34
>> Objet: Re: Parametric VM class file format
> 
>> On Apr 20, 2021, at 9:40 AM, Remi Forax  wrote:
>>> 
>>> Hi all,
>>> at least as an exercise to understand the proposed class file format for the
>>> parametric VM, i will update ASM soon (in a branch) to see how things work.
>> 
>> Thank you!
>> 
>>> As usual with ASM, there is the question of sharing the same index in the
>>> constant pool,
>>> i.e. should two anchors (SpecializationAnchor) that have the same kind and 
>>> the
>>> same BSM + constant arguments share the same constant pool index ?
>> 
>> Yes, I don’t see why not.  For a Class-level SA it would
>> be a bug if there were two of them in one classfile.
> 
> 
> As i said earlier, I hope the spec will say that you can have more than one 
> SpecializationAnchor with the kind class, given that only one will be used by 
> the VM, the one referenced by the Parametric attribute at class level.

I added this:

> It is permitted for a constant pool to contain
`CONSTANT_SpecializationAnchor` items which are unused.  But typically
a `PARAM_Class` anchor, if it exists, will be unique in its `class`
file, and be referenced by the `Parametric` attribute of the class,
and other anchors will be used to assign parametricity to the class's
methods, either singly or in groups.  It is highly probable that two
specialization anchors with the same kind and bootstrap method are in
fact interchangeable: Just as with `CONSTANT_Dynamic`, there is no
intention to provide "hooks" for structurally identical constants that
have different meanings.
> 
>> We did without ldc C_Class for a while but added it
>> for similar reasons.  Once we added it we had no
>> desire to go back to the workarounds.  I predict the
>> same for “C_Species” which is C_Linkage[C_Class,].
> 
> It may not be a problem but it means that ArrayList.class is 
> represented by C_Linkage[C_Class] while ArrayList.class is represented by 
> Condy[C_Class, C_SpecializationAnchor] which is not very symmetric.

The asymmetry is intentional because T is not a named
entity in the JVM while every class is.  As the document
says, we can consider adding CP sugar to perform common
tasks, such as reifying a type parameter from a specialization
anchor, but for now condy is our friend.

(It occurs to me that we *might* do, as sugar, CLR-like
syntaxes where an internal constant CONSTANT_Class[T1;]
could be specially decoded into some sequence of hardwired
operations that go and fetch something at position 1 in some
local specialization anchor.  The bigger adventure of putting
T1; into descriptors, making an even more CLR-like system,
will have to wait a long time.  What we are building has
fewer magic names, and more metaprogramming, than
CLR generics.)

> Also ldc C_Class is now subsumed by Condy, so having everything be managed by 
> Condy may make more sense now that Condy exists.

But why?  If I have a nice little nail file C_Class
why should I reach for my chainsaw C_Condy?

> 
>> 
>>> And second question, is there a ldc Linkage with the linkage referencing
>>> something else than a class ?
>> 
>> No!  It would be possible to find a meaning for such
>> a thing, but it would probably interfere with ldc
>> of C_Linkage[C_Class,].  (I tried.)  The workaround
>> is not bad:  Just wrap the C_Linkage[C_Methodref]
>> in a C_MethodHandle, et voilà.
>> 
>> An earlier draft made ldc of a C_Linkage recover
>> the SpecializationAnchor object, with the theory
>> that from there condy gets you everything.  But
>> I turned away from that design because (a) it was
>> clunky to use, and (b) it exposed SA objects which,
>> as I came to understand, should really be encapsulated
>> and private to their defining class.
> 
> ahh, i'm lost, it's not clear to me if a SpecializationAnchor can be a 
> bootstrap constant or not ?

(Is there a residual place in the doc which seems to say that
it would be good to ldc a C_Linkage?  I thought I removed
all of those statements.)

Anyway, you can ldc a C_Anchor and/or observe it in a BSM,
and that’s how you start the metaprogramming adventure:

> A `CONSTANT_SpecializationAnchor` constant is a (new sort of) loadable
> constant (§5.1).  The resolved value of this constant is a mirror to a
> set of specialization decisions, also called a `SpecializationAnchor`
> (§4.1).

Then you look at the rather wide SA API, and condy is
your friend from there on.  If we want, we can add more
sugar, but first we should go light on sugar.

— John

consolidated VM notes for primitive classes

2021-04-21 Thread John Rose

Brian and I hammered out a document this week that
captures what we think is emerging as our shared
understanding of how adapt the JVM to support
primitive classes.

It is still white-hot, not even off the press, but I think
it is worth looking it even in its unfinished state.

https://github.com/openjdk/valhalla-docs/blob/main/site/design-notes/state-of-valhalla/03-vm-model.md

That is the JVM side, only.  Most of it is already
prototyped in HotSpot, some is not.

I’ll let Brian speak for the valhalla-doc repository
as a whole, but I wanted to get this out there for
tomorrow’s meeting.

— John

Re: Parametric VM class file format

2021-04-21 Thread John Rose

On Apr 20, 2021, at 9:40 AM, Remi Forax  wrote:
> 
> Hi all,
> at least as an exercise to understand the proposed class file format for the 
> parametric VM, i will update ASM soon (in a branch) to see how things work.

Thank you!

> As usual with ASM, there is the question of sharing the same index in the 
> constant pool,
> i.e. should two anchors (SpecializationAnchor) that have the same kind and 
> the same BSM + constant arguments share the same constant pool index ?

Yes, I don’t see why not.  For a Class-level SA it would
be a bug if there were two of them in one classfile.

> And same question with two linkages (SpecializationLinkage).

Definitely.  Just like you’d want to unique-ify
CONSTANT_Class items.

> I believe the answer is yes for both, the index is shared.
> 
> The other questions are about "ldc Linkage", first, why do we need a ldc 
> Linkage with a Linkage that reference a class given that we already have 
> Condy ?

Condy can do a lot of this stuff, but it shouldn’t duplicate
work that the JVM is already likely to have done.
If you have code using a C_Linkage[C_Class,] the JVM
has probably resolved it to a species.  In that case,
a condy is wasted work.

We did without ldc C_Class for a while but added it
for similar reasons.  Once we added it we had no
desire to go back to the workarounds.  I predict the
same for “C_Species” which is C_Linkage[C_Class,].

> And second question, is there a ldc Linkage with the linkage referencing 
> something else than a class ?

No!  It would be possible to find a meaning for such
a thing, but it would probably interfere with ldc
of C_Linkage[C_Class,].  (I tried.)  The workaround
is not bad:  Just wrap the C_Linkage[C_Methodref]
in a C_MethodHandle, et voilà.

An earlier draft made ldc of a C_Linkage recover
the SpecializationAnchor object, with the theory
that from there condy gets you everything.  But
I turned away from that design because (a) it was
clunky to use, and (b) it exposed SA objects which,
as I came to understand, should really be encapsulated
and private to their defining class.

— John

Re: EG meeting, 2020-12-16

2020-12-16 Thread John Rose

On Dec 16, 2020, at 11:00 AM, Dan Smith  wrote:
> 
> Rémi raised some questions about our story for int vs. Integer vs. legacy 
> uses of 'new Integer'—how many implementation classes are there? (should we 
> use species?); what does reflection look like? We agreed it still needs some 
> polishing.

Here’s what I *hope* we can do, if/when we figure out how
to make an abstract class a super of *both* p-class and i-class.

And, then, if/when we figure out how to endow a class with
both p-class factories and i-class constructors.  (See discussion
about Enum, which has a similar problem.)

(Caveat:  I think these are technically feasible but they might
turn out to be too expensive to carry out, compared with other
technical goals.  We might back off to some less elegant solution
with sealed hierarchies and multiple types.  But I can hope.)

At that point we can give jl.Integer both kinds of constructors,
and allow it to instantiate both kinds of instances.  How could
that be possible?  Well, suppose specialization is not a purely
VM-level activity, but (as seems likely) defers partially to a
user-supplied bootstrap method, which decides what’s in any
given species of a class.  Next, suppose that the BSM is given
the choice (for a suitably declared specializable class) to make
species *of both kinds*.  Finally, declare Integer that way, and
make its BSM choose a legacy i-species along some paths, and
a new p-species along the preferred paths.  At least as a formal
possibility, this tactic suggests that the extra degrees of freedom
required could be confined into *one class* (Integer) and
managed using species distinctions.

Being able to do this is clearly not a primary goal of any
reasonable specialization story, but it could turn out to be
low-hanging fruit, if we are lucky.  Hence my hope.

— John

Re: Inline Record vs JLS / Reflection

2020-12-16 Thread John Rose

On Dec 16, 2020, at 12:39 PM, fo...@univ-mlv.fr wrote:
> 
> De: "John Rose" 
> 
> The last is cleanest; the cost is resolving some technical
> debt in Valhalla, which is allowing more kinds of supers
> for primitive classes.  There’s no firm reason, IMO, why
> Record could not be a super of both primitive and identity
> classes, all of which are proper records.  Basically we need
> to make interfaces and abstract classes look a little more
> similar, with respect to the requirements of primitive
> classes.
> 
>> yes, in the case of enum it's more difficult because ordinal and name are 
>> fields in java.lang.Enum.

I have a trick up my sleeve for that:  Migrate Enum
to be a parametric class, and parameterize the fields.
The effect of this will be to re-allocate them in every
subtype that asserts the parameter.  The “raw” version
would continue to behave as today, for compatibility
reasons.

> Spoiler alert:  I think the final solution will endow
> abstract classes with *both* abstract and concrete
> constructors.  The former will serve primitive
> classes and the latter will serve identity classes.
> Record will be such an abstract class.
> 
>> By abstract constructors, i suppose you mean empty constructors so they can 
>> be bypassed when creating a primitive type.

I mean constructors ( methods) which are ACC_ABSTRACT.
(Or some equivalent.)  The point is they are not only empty but
have no Code attribute.

>> It will not work with java.lang.Enum.
>> For enums, we need have both constructors and factory methods and a way to 
>> ensure that they are semantically equivalent.

You are right about the argument-taking constructors.  Something
more is needed there than I have outlined.  But I think even that
difficulty is not a blocker.  For example, if a primitive class that
inherits from Enum can (somehow) decouple from the name and
ordinal fields of Enum (say, using a specialization trick as above)
then it can probably also assume responsibility for managing its
own name and ordinal values.  They could be wired up using
method overrides replacing Enum methods.  (Hands waving…)

>> One possibility is to have the VM generating the factory methods code only 
>> inside the primitive type from the chain of superclasses, doing the 
>> transformation that is currently done by the compiler at runtime.

Yes.  A good question is how to formalize such a transformation
using as little “special pleading” as possible, preferably none at all.

>> It can works that way, if i want a primitive enum, the constructor of 
>> java.lang.Enum has to be marked as transformable to a factory method, same 
>> thing for the constructor of the primitive enum itself.

Marked as transformable, or simply marked as irrelevant.
This is related to a “wish list” item for specialized generics,
which is optional fields and methods.  (The “isPresent” field
of an Optional is needed, but not for Optional.)
Today’s Enum constructor could be made optional (using
one pretext or another), and simply ignored for p-Enums.

>> For java.lang.Enum an for any classes of a primitive enum, an empty factory 
>> method is generated with a descriptor, an empty body, and an attribute 
>> pointing to the corresponding constructor.
>> At runtime, when the class is loaded, the VM insert the correct code in the 
>> factory method of the non abstract class.

Or have *both* the factory method *and* classic constructor,
with suitable gating logic about which may be used when.
(This would be a doubling down on the tactic of making
abstract classes supers of *both* p-classes and i-classes.)

>> If the VM generates the code, we are sure that the constructor and the 
>> factory method are both equivalent.

We could also trust the author of Enum to certify this.

>> Another solution is to have a special verifying pass that verify that the 
>> constructor and the factory method are both equivalent, but it seems harder 
>> to me.

At some point, you have to trust a human author.

— John

Re: Inline Record vs JLS / Reflection

2020-12-16 Thread John Rose

On Dec 16, 2020, at 11:07 AM, Dan Smith  wrote:
> 
> I don't think we have a good answer right now, but it's something we will 
> want to address at some point. A solution would have to look like one of:
> 
> - Ask clients (e.g., "is this a record?" code) to adapt to the presence of 
> the '$ref' class
> - Modify reflection to hide the '$ref' superclass somehow
> - Change the translation strategy to not disrupt the superclass hierarchy

The last is cleanest; the cost is resolving some technical
debt in Valhalla, which is allowing more kinds of supers
for primitive classes.  There’s no firm reason, IMO, why
Record could not be a super of both primitive and identity
classes, all of which are proper records.  Basically we need
to make interfaces and abstract classes look a little more
similar, with respect to the requirements of primitive
classes.

Spoiler alert:  I think the final solution will endow
abstract classes with *both* abstract and concrete
constructors.  The former will serve primitive
classes and the latter will serve identity classes.
Record will be such an abstract class.

(Alternatively, and more clumsily, Record could
be refactored as a proper interface, but sealed
to PrimitiveRecord and IdentityRecord, and
javac would translate to one or the other.  The
methods on JLO would not be defaults on
record but would be duplicated on the two
sealed subtypes.)

— John

Re: Using a Condy instead of a Constant_Utf8

2020-12-02 Thread John Rose

On Dec 2, 2020, at 2:53 AM, Remi Forax  wrote:
> 
> There is one case where i dynamically patch a method descriptor so I can 
> select how many arguments will be sent to a closure,
> i.e. in the bytecode i put all arguments on the stack but because i've 
> patched the callee descriptor, only some of them will be used.

One workaround out of many:  Put a switch in your class template,
one case per supported arity (limited by the number of stacked items).
Then use a fixed descriptor for the call on each arm.  Drive the switch
from a condy.

BTW, if you need to down-cast from Object, use Class::cast on a
dynamic Class constant.   In the JIT those fold up the same as
checkcast.  Or just use asType.  I’m guessing you are calling
MH::invokeExact or some other low-level “erased” API point,
so casting is not strictly necessary but it may help the JIT.

— John

Re: Source code analysis: calls to wrapper class constructors

2020-11-03 Thread John Rose

On Oct 28, 2020, at 1:05 PM, fo...@univ-mlv.fr wrote:
> 
> I've never seen such bytecode shapes but I don't think i've ever seen a 
> classfile compiled with a version which was less that Java 1.2.

…I’ve seen bytecode shapes, such bytecode shapes as
would freeze the marrow.

It was dark and rainy, that night I first confronted
the Bytecode Obfuscator.

They say he no longer stalks this world, but tell me:
Why do I clutch my instruction patterns so tightly?

https://owasp.org/www-community/controls/Bytecode_obfuscation

Re: Source code analysis: calls to wrapper class constructors

2020-10-28 Thread John Rose

On Oct 28, 2020, at 2:32 PM, John Rose  wrote:
> 
> invokestatic Integer.$pop2$valueOf(Object,int)V

That would be invokestatic Integer.$pop2$valueOf(String,int,String)V

And the dummy object could be an Integer (using a condy) if we don’t
want to edit the stack maps that might mention the Integer.  They
might be present if the integer expression contains control flow.

So, invokestatic Integer.$pop2$valueOf(Integer,int,Integer)V

Re: Source code analysis: calls to wrapper class constructors

2020-10-28 Thread John Rose

On Oct 28, 2020, at 10:49 AM, Dan Smith  wrote:
> 
> You're right that this disrupts verification; I think we can address this 
> pre-verification by rewriting the StackMapTable, eliminating all references 
> to 'uninitialized(Offset)' and shrinking the stack by two.

Or we can try to keep the verification as-is by emulating the stack effects.
This requires inserting instructions, I think, but avoids reshaping the stack.

Maybe:

new Integer; dup; …(stuff that pushes int)…; invokespecial Integer.(int)V
⇒
ldc_w (String)dummy; dup; …(stuff that pushes int)…; invokestatic 
Integer.valueOf(int)Integer; swap; pop; swap; pop

Maybe use a helper which can “gobble up” the stack junk in one go:

invokestatic Integer.valueOf(int)Integer; swap; pop; swap; pop
⇒ 
invokestatic Integer.$pop2$valueOf(String,String,int)Integer

If the dummy value has migrated somewhere random, it could be picked up and 
popped:

new Integer; astore L42; …(stuff that pushes int)…; aload L42; invokespecial 
Integer.(int)V
⇒
ldc_w (String)dummy; astore L42; …(stuff that pushes int)…; invokestatic 
Integer.valueOf(int)Integer; swap; pop; aload L42; pop

As a further improvement on this theme, note that the dummy always has two 
copies, one to feed to invokespecial  and one to return to the user.  The 
one to return to the user might be at TOS, or it might be elsewhere (in L42 or 
deeper on stack).  We could do a peephole transform which finds the bytecodes 
that pull up the dummy value, move them *before* the $pop2$valueOf helper, and 
the net size change of bytecodes is zero.  The location of the invokespecial 
 might move a byte or two later.

So:

new Integer; …(stuff like dup that stores a duplicate ref)…; …(stuff that 
leaves the new ref on stack, then pushes the int)…; invokespecial 
Integer.(int)V; …(unrelated stuff)… …(stuff that ensures the replicate 
ref is now at TOS)…
⇒ 
ldc_w (String)dummy; …(same stuff like dup that stores a duplicate ref)…; 
…(same stuff that leaves the new ref on stack, then pushes the int)…;  …(same 
stuff that ensures the replicate ref is now at TOS, but moved before the 
invoke)…; invokestatic Integer.$pop2$valueOf(Object,int)V; …(same unrelated 
stuff)…

This more elaborate scheme works for both the simple “dup” case and for the 
more complicated “astore L42” case.  I don’t think it requires changing stack 
maps.

Hours of educational play for nerds 14 and up!

> The bigger limitation, which I don't think you run into in any 
> javac-generated code, is that you can put a copy of the uninitialized object 
> reference anywhere you want—in locals, duplicated 15 times on the stack, etc. 
> That's the point where I'm guessing we give up.

Re: Source code analysis: calls to wrapper class constructors

2020-10-28 Thread John Rose

Please accept the Tiger Woods Code Golf award for that one!

It only works if the “dup” output (after “new”) is still contiguous
on the stack.  That won’t be true if javac for some reason spilled
the result of “new” to a local instead of holding it on stack.

IIRC one reason to spill from stack to locals during expression
evaluation is if there is some kind of complicated control flow
inside the expression.  Different javac’s historically have
different policies about stuff like that.

> On Oct 28, 2020, at 4:25 AM, Remi Forax  wrote:
> 
> - Mail original -
>> De: "John Rose" 
>> À: "daniel smith" 
>> Cc: "valhalla-spec-experts" 
>> Envoyé: Mercredi 28 Octobre 2020 05:56:29
>> Objet: Re: Source code analysis: calls to wrapper class constructors
> 
>> On Oct 27, 2020, at 12:27 PM, Dan Smith  wrote:
>>> 
>>> This tooling will support common bytecode patterns like 'new Foo; dup; ...;
>>> invokespecial Foo.;', but will not be a comprehensive solution.
>>> (Mimicking the behavior of instance initialization method invocation in full
>>> generality would be a very difficult task.)
>> 
>> One of the reasons it’s not going to be comprehensive
>> is code like new Integer(complicatedExpr()), in which
>> the `new` and `invokespecial ` are separated
>> by (almost) arbitrarily complex bytecode.  The two
>> instructions don’t even have to be in the same basic
>> block (at the bytecode level):
>> 
>> new Integer(foo() ? bar() : baz())
>> // compiles to 4 BB’s in a diamond
>> 
>> If we add switch expressions with large sub-blocks,
>> I think we get peak separation of the start and
>> end parts of the new/init dance:
>> 
>> new Integer(switch (x) {
>> case 1 -> { complicatedBlock: try { … } catch ... ; return 0;
>> default -> { for (;;) … }} )
>> 
>> All of this gives me yet one more reason we would have
>> been better off with factory methods instead of
>> open-coding the new/init dance.  It was, in hindsight,
>> a false economy to open code the object creation “guts”
>> instead of putting them in factory API points.
>> 
>> And with an eye toward future evolutions of legacy code
>> (legacy code not yet in existence!), and uniformity with
>> the factory methods of inline classes, let’s try harder
>> to get rid of the new/init dance for identity objects.
> 
> I believe there is a quick and dirty trick,
> replace new java/lang/Integer by 3 NOPs and replace INVOKESPECIAL 
> java/lang/Integer  (I)V by INVOKESTATIC java/lang/Integer valueOf 
> (I)Ljava/lang/Integer;
> 
> It has to be done after the code is verified because the new execution 
> doesn't push java/lang/Integer on the stack anymore before calling the 
> arbitrary init expression thus any StackMapTables in between the NOPs and 
> INVOKESTATIC are invalid.
> 
>> 
>> — John
> 
> Rémi

Re: Source code analysis: calls to wrapper class constructors

2020-10-27 Thread John Rose

On Oct 27, 2020, at 12:27 PM, Dan Smith  wrote:
> 
> This tooling will support common bytecode patterns like 'new Foo; dup; ...; 
> invokespecial Foo.;', but will not be a comprehensive solution. 
> (Mimicking the behavior of instance initialization method invocation in full 
> generality would be a very difficult task.)

One of the reasons it’s not going to be comprehensive
is code like new Integer(complicatedExpr()), in which
the `new` and `invokespecial ` are separated
by (almost) arbitrarily complex bytecode.  The two
instructions don’t even have to be in the same basic
block (at the bytecode level):

new Integer(foo() ? bar() : baz())
// compiles to 4 BB’s in a diamond

If we add switch expressions with large sub-blocks,
I think we get peak separation of the start and
end parts of the new/init dance:

new Integer(switch (x) {
  case 1 -> { complicatedBlock: try { … } catch ... ; return 0;
  default -> { for (;;) … }} )

All of this gives me yet one more reason we would have
been better off with factory methods instead of
open-coding the new/init dance.  It was, in hindsight,
a false economy to open code the object creation “guts”
instead of putting them in factory API points.

And with an eye toward future evolutions of legacy code
(legacy code not yet in existence!), and uniformity with
the factory methods of inline classes, let’s try harder
to get rid of the new/init dance for identity objects.

— John

Re: Source code analysis: calls to wrapper class constructors

2020-10-27 Thread John Rose

On Oct 27, 2020, at 1:36 PM, Dan Smith  wrote:
> 
> I'm not sure whether there's a mechanism in HotSpot to generate warnings 
> about deprecated APIs at link/run time. It does seem like it would be a 
> reasonable feature...

+1

There is no such feature at present; maybe something
could be built on top of the debugger interface.  A quick
look at the event index [1] does not turn up dynamic
linkage (resolution) events, which would be the obvious
place to start.

[1]: https://docs.oracle.com/javase/1.5.0/docs/guide/jvmti/jvmti.html#EventIndex

Re: no good default issue

2020-07-31 Thread John Rose

On Jul 31, 2020, at 12:41 PM, Brian Goetz  wrote:
> 
> As far as I can tell what you're suggesting, it is that, when we detect a 
> field is not initialized, we initialize it for you with some sort of default. 
>  But that brings us back to the main problem: what if the class _has no good 
> default_?   With what do we initialize it?  

I think he was going back to the old idea of an opt-in default value,
which is then “stamped” all over arrays and fields.  A very natural
notation for this would be a no-arg constructor.

In this world, the vdefault bytecode would be privileged (usable
only inside the same capsule, to bootstrap value creation).
Where we have public uses of vdefault today, we would instead
have an API point, a call to the no-arg constructor factory
method.

The no-arg constructor would (presumably) be run just once,
the first time needed, and the value stored somewhere.  The
JVM would want to special-case this somehow.  Perhaps the
API point would surface as a well-known public-static-final?

The rest of the VM would do some tricks Dan is suggesting,
to ensure that non-private uses of the type would always
refer to the public default value.  I think it would still be
desirable for the class itself to work with “dangerous”
all-zero instances  (after all, it’s the class’s business to
define exactly how dangerous they are), so that for example
array creation inside the class might be faster than array
creation outside the class.

Personally, I view such tactics as possible but expensive,
and would like to try to get by without JVM support for
them, to start with.  The JVM engineering teams are already
overworked for Valhalla.

— John

Re: The fate of int, int.ref, and Integer

2020-06-06 Thread John Rose

On Jun 5, 2020, at 5:43 AM, Brian Goetz  wrote:
> 
> The move of saying `Integer` *is* `int.ref` makes these problems go away.  
> This seems too good to pass up preemptively.

I agree.  And this leads us into a maze of twisty passages.
Full of uninsulated electrified wires and third rails to avoid.

One tactic I like to get through the maze is to find a way to
add some ad hoc polymorphism to Integer, while keeping it
sealed up as a final class, like today.  It seems to need to cover
both good old identity objects (like new Integer(42)) and also
new inline objects ((Integer)42, boxed as before, but via a new
subtype relation).  This means there are two object types
floating around, identity-Integer and int, plus an API type,
Integer-as-super.  I think the notion of species (as a finer
grained subdivision of types, under class) can be used to
create the necessary distinctions, without introducing
a lot of new types, and breaking reflection.

In the spirit of brainstorming, here are more details on
such a path, a that might lead us through the maze.

Given this:

int.ref id = new Integer(42);  //identity object
int.val x = id;
int.val y = 42;
int.ref z = y;

…we could choose to arrange things so that all of
x, y, z are inline objects (true “ints”), while id retains
its special flavor.  Also, Object.getClass could report
Integer.class for *all* of those values (even y).  This
could be justified by revealing “int” as, not a class,
but a *species* of Integer.  So that Object.getSpecies
would report the further details:  For id it is the
the version of Integer which holds identity (which
doesn’t need a name I suppose) and for x/y/z it
reports the species reflector for int.

If we further use a muddied java.lang.Class to
continue to represent non-classes like “int”,
we have to double down on the idea of a “crass”,
or “runtime class-like type”.  In that case we can
have getSpecies return a crass, and then:

assert 42.getClass() == Integer.class;
assert 42.getSpecies() == int.class;
assert new Integer(42).getSpecies() == (something else);

Re: The fate of int, int.ref, and Integer

2020-06-04 Thread John Rose

On Jun 4, 2020, at 7:00 PM, Kevin Bourrillion  wrote:
> 
> Hello friends,
> 
> A couple thoughts on the fate of the primitives and wrappers.
> 
> First, on nomenclature, I think the most useful definitions of what it means 
> to be an "inline type" are those that reveal the primitives to already be 
> inline types. Java's always had them, but it hasn't had user-defined inline 
> types, because it hasn't had inline classes (and classes are how we 
> user-define types). That's clean, and it's not even a retcon.

+1  We have tried to keep “works like an int” as a goal.  I don’t think we’ve
compromised too much away from that; I think your formula works, except
for the technical fact that “inline” is always followed by “class”.

> Also on nomenclature, I want to avoid phrases like "you can expand the set of 
> primitives"; no, I still think that "primitives" should always apply to the 
> eight predefined, irreducible inline types. User-defined inline types are 
> always composite (how could they not be?).

Yes.  But:  I expect that JVMs will sometimes secretly define things that look
like inline classes but in fact are physically atomic (except bitwise of 
course).
Vectors in AVX are like this:  They go in one register, not many.  I expect
such things to be hidden from the end user, in places like jdk.internal.types,
and wrapped in ordinary inline wrapper classes for public consumption.

> 
> I approve of the idea of writing int.java etc. files in order to add methods 
> to `int`, and add interfaces to `int.ref`. It is fine if these files are 
> essentially "fake" (they don't actually bring the primitives into existence 
> as other classes do). I think attempts to try to make them look "real" would 
> mean letting them do things other inline types can't and it definitely 
> wouldn't seem worth it to me.

Yep.  We are a long way from doing so, I think.  We might like some kind of
Haskell-flavored fu that lets us relate those things to their operators.  At
least, I’d like to know something about that road ahead, before committing
to the initial contents of int.java.

> What I would explain is "In Java  be because it had no members. Now in Java >=X that it has members and 
> implements `Comparable`, it is a class for that reason, but the type itself 
> is still predefined with or without that class.”

I think people would not be satisfied with such an explanation, until we can
explain why 42.toString() does or doesn’t work, and how 42 < 43 connects
to a call to Comparable.compareTo, and (worst of all) how 1.0 == 1.0 connects
to the Java == operator on classes and/or Comparable.compareTo (pick one).

So we’re pretty far from making int into a class, or from writing int.java.
But, yes, we can say that primitives are (in some sense to be defined or hand
waved away) “inline types”.

> 
> (wart: yeah, arrays have no class, yet sure seem to have members `length` and 
> `clone` anyway. oh well.)
> 
> I also approve of giving the new `int` class everything it needs so that the 
> `Integer` class becomes obsolete; that is, there would no longer be any good 
> reason to use it except when forced by legacy code. (Of course, anything that 
> wants to depend on identity or locking of such an object I will just declare 
> to be legacy code, so it works!) Really though, don't bring `getInteger` over 
> when you do.

This is a maze of twisty passages.  I agree there’s are ways through it.
We want to choose a way through that doesn’t leave us disgusted with
ourselves in the morning.

> However, I am highly skeptical of attempts to do anything else beyond that. 
> I've seen, at least, the allusions to some kind of aliasing between `int.ref` 
> and `Integer`. That seems unnecessary to me, and more to the point, I feel 
> that it can only make things more confusing to users; that in fact it will 
> cause a large share of all the confusion they do feel. So wait, what IS the 
> wrapper class then? What IS this reference projection then? I see no benefit 
> to blurring that line, at this point.

Interesting.  I don’t have a strong feeling, but I *do* hope that we could 
define
by fiat that Integer is the ref-projection of int, sooner rather than later.
You are prompting me to re-examine this idea, and see what it might buy us.
At the very least, I’d like to be say List instead of List
and get away with it.  This is touching on exactly what we can do (short
and long term) with generics, which is an open question.

Are you saying that it would be risky to declare that int.ref == Integer, 
because
it would make it harder to get rid of Integer?  Isn’t it going to be impossible
anyway to get rid of Integer?  I think the problem is mainly to make Integer
as palatable as possible in the future, perhaps deprecating some of the oldest
cruft, and (at least conceptually) attributing the useful parts to int, even 
before
we venture to write int.java.

My $0.02.  Thanks for raising the question.

— John

Re: Valhalla basic concepts / terminology

2020-05-22 Thread John Rose

I like this discussion!  Smart questions and solid answers all the way through.

Weaving in my $0.02…

On May 22, 2020, at 12:36 PM, Brian Goetz  wrote:
> 
> Hi Kevin!
>> 
>>  • There are two kinds of objects/instances; the notions "object" and 
>> "instance" apply equally to both kinds. These are "inline objects" and 
>> "identity objects". Statements like "it's an instance, so that means it's on 
>> the heap" and "you can lock on any object" become invalid, but statements 
>> like "42 is an instance of `int`" are valid.
> 
> Correct.  
> 
> From a pedagogical perspective, it's not clear whether we are better off 
> framing it as a partitioning (there are two kinds, red and blue) or that some 
> objects have a special property (in addition to their state, some objects 
> have a special hidden property, its identity.)   
> 
> We have been going down the former path, but I am starting to think the 
> latter path is more helpful; rather than cleaving the world of objects in 
> two, instead highlight how some (many!) objects are "special”.  

This mirrors our voyage through various early versions of the Q+L design,
which surfaced (in the VM, if that’s surfacing) the two colors, on equal 
footing.
Then we went t0 L-world, which submerged the differences under L, with
Q-tyeps peeking out only when absolutely necessary. This taught us precisely
how similar the two kinds are, and how they can be handled under the L-type
rubric.  At this point we said goodbye to designs which would include
things like Q-Object as the top of all Q’s disjoint from L-Object.  Now we
have a Q-XOR-L design, where every name is either one or the other.

Retaining the previous insights we now know that there are abstract
types (and quasi-abstract Object and maybe others) which admit *values*
of both colors, dynamically distinguished.  After a little more OO analysis,
we realized that identity objects are a sub-type of all possible objects,
because they have extra operations (synch, side effecting state).  The
inline objects are… objects without those operations.  Which in OO
terms is a super-type, not a disjoint type.

In classic OO discourse, you partition the universe of objects into a
bunch of concrete classes, *and* you group them those same objects
by their super-classes (and interfaces, etc.).  So the objects (instances,
values even) are always only of one concrete class (what Object::getClass)
but abstractly they match to various other types.  And the OO accounts
confuse us when classes (per se) are used to build both classifications:
The disjoint union of values, and the overlapping classifications of
types.

And when I say “type” (what an overloaded term that is!) I’m talking
mainly about variables, and the value-sets that they are configured
to permit.

In the world of instances, you have identity and inline.  In the world
of variables (types), you can have variables which are exactly tied to
a particular identity or inline class, or which can hold either.
(With variables which might refer to identity objects, null is also
a potential value.  A day may come for types like String! but this
is not that day.)  What about a variable which can hold (a reference
to) an instance of *any* class that is an identity class?  Yes, we want
that because there are special operations on such classes, as noted
above.  We currently inject this type as a magic interface.  What
about a variable which can hold an instance of any class that is
an inline class?  That doesn’t make sense from the POV of types
which support operations common to some set of instances,
because inline classes are simpler (less colorful?) than identity
classes.  Right now, to me, having an InlineObject type makes
as much sense as having a not-List type, whose value set
is everything that *doesn’t* act like a list.  OO hierarchies
are built up additively, guided by functions and contracts;
they are not built on exclusions.  (You can have disjoint
unions, as with sealed classes, so we *could* define some
sort of sealing condition imposed on concrete classes, as
{Object…} = DISJOINT_UNION({null}, {inline…},
{identity…}).  But the use cases we know about aren’t
asking for it, and it adds more complexity to the story
of Java’s type hierarchies without visible benefit.)

> 
>>  • (do we intend to use the term "object", or use the term 
>> "instance", or define the two differently somehow?)
> To the extent we can avoid redefining these things, I think it is easier to 
> just leave these terms in place.

Yes.  AFAIKT  “object” and “instance” are synonyms in Java.  We played
around with having the verbal distinction do work for us; maybe an
“object” is really (was all along) an identity object, while an “instance”
could be either.  But the existing usages of those terms don’t happen
to favor any such scheme.  So we coined new terms.

>>  • Identity objects get the benefits of identity, at the cost that you 
>> may only store references to them. They will

Re: null checks vs. class resolution, and translation strategy for casts

2020-04-10 Thread John Rose

On Apr 10, 2020, at 4:19 AM, fo...@univ-mlv.fr wrote:
> 
>> So, here’s a recommendation:  Use indy, and use a clunkier
>> fallback in the same places that today use a clunkier fallback
>> for string concatenation.  And, record a line item of technical
>> debt that we should further explore indy intrinsics, after we
>> figure out what javac intrinsics look like.
> 
> What is not clear to me is that javac can replace unbox by a nullcheck, for 
> the VM, the input is an interface and the output is an inline type, given 
> that interfaces are not checked until runtime, how the VM can validate that 
> only a nullcheck is enough ?

It can’t; that’s why I’m saying javac needs to ask for a null check,
*and* somehow affirm the inline type (subtype of interface).
This is two bytecodes, invokestatic Objects.requireNN, plus
checkcast C.

> Also it's still not clear to me what indy provide in this case.

It provides both of the above effects in one bytecode.  The bytecode,
in turn, can expand to some internal JVM intrinsic which the runtime
will optimize better than a back-to-back combo of the two standard
instructions.  That intrinsic never has to be admitted to by any spec.

> So i still think that doing a checkcast  (reusing checkcast being a trick to 
> avoid to introduce a new bytecode) or having a special unbox opcode is a 
> better idea. 

Changing opcode behaviors and/or adding new opcodes is always
more expensive than appealing to indy, even if we have to add secret
optimizations to indy.  Specs are almost always harder to change than
optimizations.

— John

Re: null checks vs. class resolution, and translation strategy for casts

2020-04-09 Thread John Rose

On Apr 9, 2020, at 2:31 PM, fo...@univ-mlv.fr wrote:
> 
> yes, indy is a way to create any new bytecode, but it also has some 
> restrictions,
> the major one being that you can not using it before it has been bootstrapped.

Good point; we found that with string concatenation, didn’t we?
If we use indy for this, we’ll run into similar bootstrapping issues.

Which reminds me that Brian has been pondering javac intrinsics
for some time, as a way of replacing method calls that would
ordinarily be linked and run the normal way, with preferable
alternative implementations.  This game could also be played
(very carefully) with BSMs.  That (like javac intrinsics) would
sidestep the usual bootstrapping orders.

So, here’s a recommendation:  Use indy, and use a clunkier
fallback in the same places that today use a clunkier fallback
for string concatenation.  And, record a line item of technical
debt that we should further explore indy intrinsics, after we
figure out what javac intrinsics look like.

— John

Re: null checks vs. class resolution, and translation strategy for casts

2020-04-09 Thread John Rose

On Apr 9, 2020, at 2:07 PM, John Rose  wrote:
> 
> Perhaps we want another (intrinsically optimized) version
> of Objects::requireNonNull, which takes a second argument
> that assists in generating a better diagnostic.

(D’oh; there it stands in the the JDK already.)

Re: null checks vs. class resolution, and translation strategy for casts

2020-04-09 Thread John Rose

On Apr 9, 2020, at 1:20 PM, John Rose  wrote:
> 
> No specs were harmed in making this proposal.

P.P.S. Although there’s no precedent yet for it except static
code rewriters, we could also intrinsify certain indy instructions
in the same way, as early as the interpreter.  Then we’d have
customized verifier rules, based on each indy instruction signature,
at no runtime cost, even at startup, thanks to the intrinsification
logic.  There are lots of ways to skin this… orange.

Re: null checks vs. class resolution, and translation strategy for casts

2020-04-09 Thread John Rose

On Apr 9, 2020, at 1:03 PM, Brian Goetz  wrote:
> 
> 
>> I have a proposal for a translation strategy:
> 
> Casts to inline classes from their reference projections will be frequent.  
> Because the reference projection is sealed to permit only the value 
> projection, a cast is morally equivalent to a null check. We want to preserve 
> this performance model, because otherwise we're reinventing boxing.
> 
> Going through `ldc X.class / invokevirtual Class.cast` will surely be slow in 
> the interpreter, but also risks being slow elsewhere (as do many of the other 
> options.)
> 
> So let me add to your list: is it time for a `checknonnull` bytecode, which 
> throws NPE if null, or some other more flexible checking bytecode?  
> (Alternatively, if we're saving bytecodes: `invokevirtual 
> Object.`), where  is a fake method that always links to 
> a no-op, but invokevirtual NPEs on a null receiver.)

Um, this feels a lot like a premature optimization.  Let’s not add
`checknonnull` intrinsics to the interpreter (the very most
expensive way to do it) until we have tried the other alternatives
(Objects.requireNonNull, etc.) and have proven that the costs
are noticeable.  And a spec EG is not the place to evaluate such
questions; it has to be demonstrated in a prototype.

I see now why you are angling for verifier rules that know about
sealing relations.  I think that also is premature optimizations.
Actually, verifier rules (not interpreter bytecodes) are the most
costly way to get anything done.

Sorry to be a party pooper here, but that’s how it looks right now.

— John

Re: null checks vs. class resolution, and translation strategy for casts

2020-04-09 Thread John Rose

Correction…  The recommended reflective approach has
a flaw (easily fixed), which makes indy my real recommendation.

On Apr 8, 2020, at 11:43 AM, John Rose  wrote:
> …
> I have a proposal for a translation strategy:
> 
> 1. Translate casts to inline classes differently from “classic”
> casts.  Add an extra step of null hostility.  For very low-level
> reasons, I suggest using “ldc X” followed by Class::cast.
> 
> Generally speaking, it’s a reasonable move to use reflective
> API points (like Class::cast) on constant metadata (like X.class)
> to implement language semantics.

This suggestion is incomplete.  If the result of the cast is
going to be used as type X, then the verifier must be
pacified by adding `checkcast X`.  Basically, you have
to do both reflective and intrinsic cast operations, if you
need to get the verifier on board, as well as do a null
check.  That tips me over to recommending indy instead,
which was #2.  Indy, that Swiss army knife of an instruction,
can get it done in one.

> The following alternatives are also possible; I present them
> in decreasing order of preference:
> 
> 2. Use invokedynamic to roll our own instruction.  It will
> be a trivial BSM since we are really just doing an asType
> operation.  But I think this is probably overkill, despite
> my fondness for indy.

For a conversion to type X, where X may be a null-hostile
inline type (or any type whose semantics is not exactly
covered by native checkcast), a single invokedynamic
instruction will cover the operational semantics
required and will also feed the right type to the verifier.
It will have this signature:

  (Object) => X

It will have a utility bootstrap method which materializes
conversions, basically riffing on MethodHandles::identity
and asType.  (Not MethodHandles::explicitCastArguments,
because we are concerned with checked reference conversions.)

It will have *no extra arguments* (not even X.class), because
the BSM can easily derive X.class from the return type of
the method type signature passed to the BSM.

ConstantCallSite convertBSM(Lookup ig1, String ig2, MethodType mt) {
  var mh = MethodHandles.identity(Object.class).asType(mt);
  return new ConstantCallSite(mh);
}

As such, it is a candidate for proposed simplifications to
bootstrap method configuration (but not the simplest such
simplifications, because of the need to feed X.class into
the linkage logic).

MethodHandle simplifiedConvertBSM() {
  return MethodHandles.identity(Object.class);
}

(At some point I should write up those simplifications,
shouldn’t I?)

— John

access control for withfield bytecode, compared to putfield

2020-04-08 Thread John Rose

In the Java language fields can be final or not, and independently
can be access controlled at one of four levels of access:  public,
protected, package, and private.

Final fields cannot be written to except under very narrow
circumstances:  (a) In an initialization block (static initializer
or constructor body), and (b) only if the static compiler can
prove there has been no previous write (based on the rules
of the language).

We are adding inline classes, whose non-static fields are always
final.  (There are possible meanings for non-final fields of inline
classes, but nothing I’m saying today interacts or interferes
with any known such meanings.)  Behaviorally, an inline class
behaves like a class with all-final non-static fields, *and* it has
its identity radically suppressed by the JVM.  In the language,
a constructor for an inline class is approximately indistinguishable
from a constructor for a regular class with all-final non-static fields.
In particular, a constructor of any class (inline or regular identity)
is empowered, by rules of the the language, to set each of its (final,
non-static) fields exactly once along any path through the constructor.

All of this hangs together nicely. When we translate to the JVM,
the reading of any non-static field always uses the getfield instruction,
and the access checks built into the JVM enforce the language access
rules for that field—and this is true equally for inline and identity
classes (the JVM doesn’t care).  However, we have to use distinct
tactics for translating assignments to fields.  The existing putfield
instruction has no possible applicability to inline classes, because
it assumes you can pass it an instance pointer, execute it, and the
*same instance pointer* will refer to the updated instance.  This
cannot possibly work with inline classes (unless we add a whole
new layer of “larval” states to inline classes—which would not be
thrifty design).

Instead, setting the field of an inline class needs a new bytecode ,
a new sibling of getfield and putfield, which we call withfield.
Its output is a new instance of the same inline class whose
field values are all identical to those in the old instance, except
for the one field referred to by the withfield instruction.  Thus:

* getfield consumes a reference and returns a value (I) → (F)
* putfield consumes both and returns a side effect (I F) & state → () & state′
* withfield  consumes same as putfield and produces a new instance (I F) → (I′)

The access checking rules are fairly uniform for all of these
instructions.  If the field F of C has protection level P, unless a client
has access to level P of C, then it cannot execute (cannot even resolve)
the instruction that tries to access F.  In the case of putfield or
withfield, if F is final (and for withfield that is currently always
the case, though that could change), then an additional check
is made, to ensure that F is only being set in a legitimate context.
More in a moment on what “legitimate” means for this “context”.
The getfield instruction only has to pass the access check, and then
the client has full access to read the value of the field.  This works
pleasingly like the source-level expression which fetches the field
value.

Currently, for a non-static final field, both “putfield” and “withfield”
are generated only inside of constructors, which have rigid rules,
in the source language, that ensure nothing too fishy can happen.

For an identity class C, it would be extremely fishy if the classfile of
C were able to execute putfield instructions outside of one of C’s
constructors.  The reason for this is that a constructor of C would
be able to produce a supposedly all-final instance of C, but then
some other method of C would be (in principle) be able to overwrite
one of C’s supposedly final fields with some other value, by executing
a putfield instruction in that other method.  Now, the JVM doesn’t
fully trust final fields even today (because they change state at most
once from default to some other value), but if maliciously spun
classfiles were able to perform “putfield” at will on fully constructed
objects, it might be possible to create paradoxes that could lead
to unpredictable behavior.  For this reason, not only doesn’t the
JVM fully trust final fields, but it also forbids classes from executing
putfield on their own final fields, except inside of constructors.
In essence, putfield on a final field is a special restricted operating
mode of putfield which has unusually tight restrictions on its
execution.  In this note I’d like to call it out with a special name,
putfield-on-a-final.

Note that the JVM does *not* fully enforce the Java source language
rules for field initialization:  At the JVM level, a constructor can
run putfield-on-a-final, on some given field, zero, one, or many
times, where the Java language requires at most one, and exactly
one on normal exits.  The JVM simply provides a reasonable backstop
check,

ClassValue performance model and Record::toString

2020-04-08 Thread John Rose

This note is prompted by work in a parallel project, Amber,
on the implementation record types, but is properly a JVM
question about JSR 292 functionality.  Since we’ve got a quorum
of experts here, and since we briefly raised the topic this morning
on a Zoom chat, I’ll raise the question here of ClassValue performance.
I’m BCC-ing amber-spec-experts so they know we are takling
about this.  (In fact the EGs overlap.)

JSR 292 introduced ClassValue as a hook for libraries (especially
dynamic language implementations) to efficiently store library
specific metadata on JVM classes.  A general use case envisioned
was to store method handles (or tuples of them) on classes, where
a lazy link step (tied to the semantics of ClassValue::get) would
materialize the required M’s as needed.  A specific use case was
to be able to create extensible v-table-like structures, where a
CV would embody a v-table position, and each CV::get binding
would embody a filled slot at that v-table position, for a particular
class.

The assumption was that dynamic languages using CV would
continue to use the JVM’s built-in class mechanism for part or
all of their own types, and also that it would be helpful for a
dynamic language to adjoin metadata to system classes like
java.lang.String.  Both tactics have been used in the field.
In the future, template classes may provide an even richer
substrate for the types of non-Java languages.

JSR 292 was envisioned for dynamic languages, but was built
according to the inherent capabilities of the JVM, and so
eventually (actually, in the next release!) it has been used
for Java language implementations as well (indy for lambda).
ClassValue has not yet been used to implement Java language
features, but I believe the time may have come to do so.

The general use case I have in mind is an efficient translation
strategy for generic algorithms, where the genericity is in the
receiver type.  The specific use case is the default toString method
of records (and also the equals and hashCode methods).

The logic of this method is generic over the receiver type.
For each record type (unless that record type overrides its
toString method in source code), the toString method is
defined to iterate over the fields of the record type, and
produce a printed representation that mentions both the
names and values of the fields.  The name of the record’s
class is also mentioned.

If you ask an intermediate Java coder for an implementation
of this spec., you will get something resembling an interpreter
which walks over the metadata of “this.getClass()” and collects
the necessary strings into a string builder.

If you then deliver this code to users, after about a microsecond
you will get complaints about its performance.  We’re old hands
who don’t fall for such traps, so we asked an experienced coder
for better code.  That code runs the interpreter-like logic once
per distinct record type, collecting the distinct field accesses
and folding up the string concatenations into a spongy mass
of method handles, depositing the result in a cache.  That’s
better!

(Programming with method handles is, alas, not an improvement
over source code.  Java hasn’t found its best self yet for doing partial
evaluation algorithms, though there is good work out there, like
Truffle.)

In order not to have bad performance numbers, we are also
preconditioning the v-table slot for each record’s toString
method, as follows:

0. If the record already has a source-code definition, do nothing
special.

1. Otherwise, synthesize a synthetic override method to
Object::toString which contains a single indy instruction.
(There is also data movement via aload and return.)

2. Set up the indy to run the fancy partial MH-builder mentioned
above, the first time, and use the cached MH the second time.

3. Profit.

In essence, toString works like a generic algorithm, where the
generic type parameter is the receiver type.  (If we had template
methods we’d have another route to take but not today…)

This works great.  But there’s a flaw, because it doesn’t use ClassValue.
As far as I can tell, it would be better for the translation strategy to
*not* generate synthetic methods, but instead to put steps 1. and 2.
above into a plain old Java method called Record::toString.  This
method would call x=this.getClass() and then y=R_TOSTRING.get(x)
and then y.invokeExact(this).

Non-use of CV is not the flaw, it’s the cause of the flaw.  The
flaw is apparent if you read the javadoc for Record::toString.
It doesn’t say there’s a method there (because there isn’t) but
it says weaselly stuff about “the default method provided does
this and that”.  In a purely dynamic OOL, the default method
is just method bound to Record::toString, and it’s active as
long as nobody overrides it (or calls super.toString).  People
spend years learning to reason about overrides in OOLs like
Java, and we should cater to that.  We could in this case, but
we don’t, because we are pulling a

null checks vs. class resolution, and translation strategy for casts

2020-04-08 Thread John Rose

The latest translation strategies for inline classes involve
two classfiles, one for the actual inline class C and one for
its reference projection N.  The reference projection N exists
to provide a name for the type “C or null”.  As we all know
on this list, this is a surprisingly pleasant way to handle the
problem of representing the two types.

(This was a surprise to me; I had assumed from the beginning
of the project that our build-out of new descriptors, including
Q-types, would inevitably provide the natural way for the JVM
to express null vs. non-null versions of the same nominal class.
But this failed to correspond to a language-level type system
that was workable, and also broke binary compatibility in some
cases where we wished to migrate old L-types to new Q-types.
Having names for both C and N fixes both problems, with
surprisingly little cost to the JVM’s model of types.)

But there’s a problem in this translation strategy with null values
which needs resolution.  To be quite precise, this problem requires
careful non-resolution.  The issue is the exact sequencing of the
JVM’s intrinsic null-checking operations as applied to types
which may or may not be inline classes.

(One of the delights of working on a compatible language a
quarter century old is that there’s always more to the story,
because however simple your model of things might seem,
there’s always some 25-year-old constraint you have to cope
with, that adds surprising complexity to your simple mental
model.  Today’s topic is null checking of the instanceof and
checkcast instructions, which we just discussed in a Zoom
meeting—special thanks to Dan H. and Remi and Fred P. for
guiding me in this topic.)

The static operand of an instanceof or checklist instruction
indexes a constant poool entry of type CONSTANT_Class_info
(defined in JVMS §4.4.1).  Such a C_Class entry is resolvable.
Indeed, bytecodes that use such an entry are specified to
resolve it first, which, may cause a cascade of side effects
including loading a classfile, if it has not already been loaded.
§6.5 says this about instanceof (I have added numbers but
nothing else):

> 1. The run-time constant pool item at the index must be a symbolic
> reference to a class, array, or interface type.
> 
> 2. If objectref is null, the instanceof instruction pushes an int
> result of 0 as an int onto the operand stack.
> 
> 3. Otherwise, the named class, array, or interface type is resolved
> (§5.4.3.1).

The corresponding documentation for checkcast is identical
except (as you might expect) for this step:

> 2. If objectref is null, then the operand stack is unchanged.


Step 1 says, “you must point to a C_Class”.  This is checked
when the class file containing the instruction is loaded.  This
step does *not* call for any classfiles to be loaded.

Step 2 handles the null case.

Step 3 requires that the C_Class reference be resolved, so that
the resolved class can be used to finish the instruction.  The
next step (4) is not so important here but I’ll include it here for
completeness, for both instanceof and checkcast:

> 4. If objectref is an instance of the resolved class or array type, or
> implements the resolved interface, the instanceof instruction pushes
> an int result of 1 as an int onto the operand stack; otherwise, it
> pushes an int result of 0.

> 4. If objectref can be cast to the resolved class, array, or interface
> type, the operand stack is unchanged; otherwise, the checkcast
> instruction throws a ClassCastException.

Notice that if the object reference on the stack is null then step
2 finishes the instruction, and step 3 is not executed to load the
referenced class (nor is step 4 executed).

This is a little bit inconvenient in the case of a checkcast to an
inline class type.  The Java language requires that a cast to an
inline class must always fail on null, while a cast to a regular
identity class must always succeed on null.  (If we ever add
other null-rejecting types to the language, similar points will
hold for their casts.)  This means that checkcast is not exactly
right as a translation for source-level cast to an inline type.

You might think the ordering of steps 2 and 3 is an unimportant
optimization:  Why bother to do the work of loading the class if
you know the outcome of the instruction (because the operand
happens to be null)?  It’s a little more than an optimization, though.
What would happen if we were to switch the order of steps 2 and
3, so that the class is always loaded?  Could we switch the order
of checks in the JVM, moving forward from here, so that the
Java language compiler can use checkcast to translate inline
type casts?  Or, does it even matter; why not just translate with
the existing instruction even if it does let nulls through?

First, the existing behavior is important, to some extent.
If we were to switch steps 2 and 3, existing programs would
change their behavior during bootstrapping (class loading).
Suppose some class X is

Re: for review: 8236522: "always atomic" modifier for inline classes to enforce atomicity

2020-03-07 Thread John Rose

On Mar 7, 2020, at 2:22 PM, fo...@univ-mlv.fr wrote:
> 
> Marker interface are usually problematic because:
> - they can inherited, for inline classes, you can put them on our new kind of 
> abstract class, which will make things just harder to diagnose.

As always the flexibility of inheritance cuts both ways.
Suppose I define AbstractSlice with subtypes MemorySlice,
ArraySlice, etc. and I intend it for secure applications.
I then mark AbstractSlice as NonTearable, and all its subs
are therefore also NonTearable.  You cannot do that with
an ad hoc keyword, even if you want to.  You have to make
sure that every concrete subtype mentions the keyword.

It’s a trade-off, of course, but for me the cost of a new keyword
pushes me towards using types making the property inherited.
It’s a decision which falls squarely in the center of the language.

> - they can be uses as a type, like Serializable is used where it should not.
>  By example, what an array of java.lang.NonTearable means exactly. There is a 
> potential for a lot of confusion. 

Again, if I have an algorithm that works for a range of value
types (via an interface or abstract super), I can express the
requirement that the inputs to the algorithm be non-tearable,
using subtypes.  For example, the bound (Record & NonTearable)
expresses and enforces the intention that the algorithm will
operate on non-tearable record values.

> and in the specific case of NonTerable, a non inline class can implement it, 
> again creating confusion.

The confusion comes from the incomplete story here.  I’d like to
suggest that IdentityObject implements NonTearable, so that
bounds like Record & NonTearable allow identity and inline
objects.

— John

Re: for review: 8236522: "always atomic" modifier for inline classes to enforce atomicity

2020-03-07 Thread John Rose

On Mar 7, 2020, at 1:41 PM, Remi Forax  wrote:
> 
> [Moving to valhalla-spec-experts]
> 
> - Mail original -----
>> De: "John Rose" 
>> À: "Tobias Hartmann" 
>> Cc: "valhalla-dev" 
>> Envoyé: Vendredi 21 Février 2020 11:23:14
>> Objet: Re: for review: 8236522: "always atomic" modifier for inline classes 
>> to enforce atomicity
> 
>> I’ve come back around to this feature, after (SMH) realizing
>> it should be a marker interface (java.lang.NonTearable) instead
>> of a new modifier and access flag.  Thanks, Brian…
> 
> In my opinion, using an annotation here cross the rubicon,
> user available annotations are not supposed to change the semantics of a 
> language construct, they are metadata.

I agree.

> Do you really want the memory model to make reference of an annotation ?

No.

> Or worst, do you think that you can avoid to change the JMM to describe the 
> effects of always atomic inline classes ?

The JMM has to change, but it doesn’t have to mention any Java syntax.
I’m anticipating that we adjust the existing language about tearing of
longs and doubles to cover values also.

> What is the reason to change/move the line here ?
> Are you envisioning other annotations that can change the semantics ?
> 
> Why can not be a keyword at source level and an attribute at classfile level ?


So, that’s a different question.  It shouldn’t be that either, IMO, because
that is also disruptive to the specifications.  Keywords and attributes
cost a lot to specify and implement, right?  I prototyped a keyword,
and backed away to a marker interface, which I think is the right answer.

So: not an annotation, not a keyword, but a marker interface.
It is certainly the path of least resistance for a prototype, so we’re
doing that right now.  It might even be the right answer in the long
run.  Here’s the current draft java doc for java.lang.NonTearable:

> An inline class implements the {@code NonTearable} interface to
> request that the JVM take extra care to avoid structure tearing
> when loading or storing any value of the class to a field or array
> element.  Normally, only fields declared {@code volatile} are
> protected against structure tearing, but a class that implements
> this marker interface will never have its values torn, even when
> they are stored in array elements or in non-{@code volatile}
> fields, and even when multiple threads perform racing writes.
> 
>  An inline instance of multiple components is said to be "torn"
> when two racing threads compete to write those components, and one
> thread writes some components while another thread writes other
> components, so a subsequent observer will read a hybrid composed,
> as if "out of thin air", of field values from both racing writes.
> Tearing can also occur when the effects of two non-racing writes
> are observed by a racing read.  In general, structure tearing
> requires a read and two writes (initialization counting as a write)
> of a multi-component value, with a race between any two of the
> accesses.  The effect can also be described as if the Java memory
> model break up inline instance reads and writes into reads and
> writes of their various fields, as it does with longs and doubles
> (JLS 17.7).
> 
>  In extreme cases, the hybrid observed after structure tearing
> might be a value which is impossible to construct by normal means.
> If data integrity or security depends on proper construction,
> the class should be declared as implementing {@code NonTearable}.

Also, we could add a paragraph giving full disclosure about the
meaninglessness of having non-inlines implement NonTearable,
or maybe in the end we can position NonTearable in a
place where non-inlines cannot implement it.  I’m inclined to
leave it out for now, at least until we figure out the shape of the
type hierarchy immediately under Object.

Comments?

— John

Re: Reference-default style

2020-02-09 Thread John Rose

Good point. For our purposes the abstract ctor must always resolve to Object. 
And it must have the empty signature right?

A small remaining point: There might be other use cases in the future for other 
configurations which make logical sense in the same way. If so we can expand 
the permissions to other constructors besides Object::()V. For now that’s 
the only one we care about delegating you. 

Have I got it now?

On Feb 9, 2020, at 7:26 AM, Brian Goetz  wrote:
> 
> I think what dan is saying is that you are positing a degree of freedoms 
> that is unnecessary.  We want to have abs classes that can be a base for both 
> inline and idents.An abstract ctor can be the indicator of this.  But, 
> why bother with allowing such a class to extend one that doesn’t meet the 
> same requirements?  They will be useless for in lines anyway.  Require that 
> the ctors be “abstract all the way up.”  
> 
> Sent from my iPad
> 
>> On Feb 9, 2020, at 1:08 AM, John Rose  wrote:
>> 
>>> On Feb 8, 2020, at 9:08 PM, Dan Smith  wrote:
>>> 
>>> Oh, yeah, if we need to make sure that code gets executed (for identity 
>>> classes), that will affect the design.
>> 
>> That’s the root of the stuff you found perhaps unnecessary.
>> It could be done the way you propose also, but adding the
>> ability of the invokespecial to turn into a “pop”, and dealing
>> with the loss of Object:: as a handy point of reference,
>> makes for a different, less regular set of JVM changes.
>> 
>> I could go either way on having Object:: changed to
>> be abstract, but I think it’s safer to leave it exactly as is,
>> and then just say “inlines never get there”.
>> 
>> — John

Re: Reference-default style

2020-02-08 Thread John Rose

On Feb 8, 2020, at 9:08 PM, Dan Smith  wrote:
> 
> Oh, yeah, if we need to make sure that code gets executed (for identity 
> classes), that will affect the design.

That’s the root of the stuff you found perhaps unnecessary.
It could be done the way you propose also, but adding the
ability of the invokespecial to turn into a “pop”, and dealing
with the loss of Object:: as a handy point of reference,
makes for a different, less regular set of JVM changes.

I could go either way on having Object:: changed to
be abstract, but I think it’s safer to leave it exactly as is,
and then just say “inlines never get there”.

— John

Re: Reference-default style

2020-02-07 Thread John Rose

On Feb 7, 2020, at 3:39 PM, Brian Goetz  wrote:
> 
> 
>> (To remind everyone:  We are using two half-buckets rather than one bucket
>> mainly so that Optional can be migrated.  If it were just supporting V? then
>> we’d use an empty marker type, I think, probably just an interface.)
> 
> The two half buckets also exist because it is how we get primitives and 
> inlines to be the same thing, and not end up with THREE kinds of types.

Good point; thanks.  In the case of primitives, it might turn out to be
more than two half-buckets, depending on if and how we choose to
support identity-bearing primitive wrappers (today’s new Integer(42)).

Towards the baroque end of things, I can imagine a three-bucket solution:

nest {
abstract class Integer permits intThePrimitive, intTheBox {
   …migration support here…
}
inline class intThePrimitive extends Integer {
   …
}
final class intTheBox extends Integer {
  private final intThePrimitive value;
  ...
}
}

Re: Reference-default style

2020-02-07 Thread John Rose

(Replying second to your first.)

On Feb 7, 2020, at 9:34 AM, Brian Goetz  wrote:
> 
> I want to combine this discussion with the question about whether inline 
> classes can extend abstract classes, and with the "reference projection" 
> story in general.  

Thanks for your “John would say…”; I’ll take those as already said!

The only other thing is I guess I prefer your Translation D over
Translation E, and either of those over the other options.

That is, given two half-buckets, put every API chunk in exactly one
half, and put most of the chunks in the side of the bucket with
the good name.  (And have Core Reflection copy all chunks to
both names.)

— John

Re: Reference-default style

2020-02-07 Thread John Rose

On Feb 7, 2020, at 2:39 PM, Dan Smith  wrote:
> 
>> On Feb 7, 2020, at 3:05 PM, Brian Goetz  wrote:
>> 
>> - Supertypes, methods (including static methods, and including the static 
>> "constructor" factory), and static fields are lifted onto the ref projection.
> 
> This deserves highlighting: in theory, there's nothing wrong with putting all 
> of an inline class's methods in an abstract superclass. In practice, I'm 
> curious to know whether we take a performance hit, and if so whether it's 
> straightforward to optimize away.
> 
> (And if there *is* a significant performance penalty, is it so bad that it's 
> impractical to *ever* make serious use of superclass or superinterface 
> methods in performance-sensitive contexts? That may be okay, but it's 
> something authors will want to understand.)

Putting something in a surprising place has surprising penalties, sometimes.
But, the JVM is designed from the ground up to manage single inheritance
code sharing extremely well.  I don’t think there’s much of a performance
penalty here for putting code on a superclass rather than a subclass.

That said, putting code in a surprising place sometimes leads to surprises.

Using interface default methods is significantly riskier, as I implied in my
previous comment.

> I also kind of wonder if there are some code paths in Hotspot that will never 
> get seriously exercised if javac never invokes a method declared in an inline 
> class. (Not really an argument against this strategy, just an observation.)

Good observation.  We could see “surprises”.

Basically, the two classfiles are logically one compilation unit, so it should
be OK to put anything from the source file in either half-bucket.

(To remind everyone:  We are using two half-buckets rather than one bucket
mainly so that Optional can be migrated.  If it were just supporting V? then
we’d use an empty marker type, I think, probably just an interface.)

I’m not sure exactly why I feel queasy about hollowing out the V.ref,
and that’s probably because my “anti-surprise heuristic” (aka. spidey
sense) is responsible, and such heuristics can’t every say much more
than “I’ve got a bad feeling about this”.  But a rarely used code path
is one door that surprises can pop through.  :-S

— John

Re: Reference-default style

2020-02-07 Thread John Rose

On Feb 7, 2020, at 2:05 PM, Brian Goetz  wrote:
>> 
>> So, summary:
>> 
>>  - Yes, we should figure out how to support abstract class supertypes of 
>> inline classes, if only at the VM level;
>>  - There should be one way to declare an inline class, with a modifier 
>> saying which projection gets the good name;
>>  - Both the ref and val projections should have the same accessibility, in 
>> part so that the compiler can freely use inline widening/narrowing as 
>> convenient;
>>  - We would prefer to avoid duplication of the methods on both projections, 
>> where possible;
>>  - The migration case requires that, for ref-default inline classes, we 
>> translate so that the methods appear on the ref projection.

Abstract classes, check.  User control over good name, check.
Co-accessibility of both projections, check.  No schema duplication,
check.  Methods on ref projection for migration, check.  Awesome!

I’m relieved that we are embracing abstract classes, because (a) the
JVM processes them a little more easily than interfaces, and (b) they
have fewer nit-picky limitations than interfaces (toString/equals/hashCode,
package access members).  Thanks, Dan and whoever else agitated for
abstract classes; the JVM thanks you.

I have a tiny reservation about the co-accessibility of both projections,
although it’s a good principle overall.  There might be cases (migration
and maybe new code) where the nullable type has wider access than
the inline type, where the type’s contract somehow embraces nullability
to the extent that the .val projection is invisible.  But we can cross that
bridge when and if we come to it; I can’t think of compelling examples.

> Let me flesh this out some more, since the previous mail was a bit of a 
> winding lead-up.  
> 
>  Abstract class supertypes
> 
> It is desirable, both for the migration story and for the language in 
> general, for inline classes to be able to extend abstract classes.  There are 
> restrictions: no fields, no constructors, no instance initializers, no 
> synchronized methods.  These can be checked by the compiler at compile time, 
> and need to be re-checked by the VM at load time.

(Nitpick:  The JVM *fully* checks synchronization of such things dynamically;
it cannot fully check at load time.  Given that, it is not a good idea to 
partially
check for evidence of synchronization; that just creates the semblance of an
invariant where one does not exist.  The JVM tries hard to make static checks
that actually prove things, rather than just “catch user errors”.  So, please,
no JVM load-time checks for synchronized methods, except *maybe* within
the inline classes themselves.)

> The VM folks have indicated that their preferred way to say "inline-friendly 
> abstract class" is to have only a no-arg constructor which is ACC_ABSTRACT.  
> For abstract classes that meet the inline-friendly requirement, the static 
> compiler can replace the default constructor we generate now with an abstract 
> one.  The VM would have to be able to deal with subclasses doing 
> `invokespecial ` super-calls on these.  

More info, from a JVM perspective:

In that case, and that case alone, the JVM would validly look up the superclass
chain for a non-abstract  method, and link to that instead.  This is a 
very
special case of inheritance where a constructor is inherited and used as-is, 
rather
than wrapped by a subclass constructor.  It’s a valid operation precisely 
because the
abstract constructor is provably a no-op.  The Object constructor is the initial
point of this inheritance process, and the end of the upward search.  I’m 
leaning
towards keeping that as non-abstract, both for compatibility, and as a physical
landing place for the upward search past abstract constructors.  For inlines, we
say that the inline class constructor is required to inherit the Object 
constructor,
with no non-abstract constructors in intervening supers, and furthermore that
the JVM is allowed to omit the call to the Object constructor.  This amounts to
a special pleading that “everybody knows Object. does nothing”.  Actually
in HotSpot it does something:  For a class with a finalizer it registers 
something
somewhere.  But that’s precisely irrelevant to inlines.

> 
> My current bikeshed preference for how to indicate these is to do just the 
> test structurally, with good error messages, and back it up with annotation 
> support similar to `@FunctionalInterface` that turns on stricter type 
> checking and documentation support.  (The case we would worry about, which 
> stronger declaration-site indication would help with, would be: a public 
> accidentally-inline-friendly abstract class in one maintenance domain, 
> extended by an inline class in another maintenance domain, and then 
> subsequently the abstract class is modified to, say, add a field.  This could 
> happen, but likely would not happen that often; we can warn users of the 
> risks by additionally issuing a warning on the

Re: atomicity for value types

2020-01-14 Thread John Rose

On Jan 14, 2020, at 4:52 PM, Remi Forax  wrote:
> 
> In the context of Java, we are already using the term 'atomic', in 
> AtomicInteger, AtomicLong, etc,

Even more fundamentally, the term “atomic” is in the JLS, JVMS, and JMM
with the same meaning being proposed here, and *not* subsumed by nor identical
with “volatile”.

JMVS 4:

> Untyped instructions that manipulate the operand stack must treat values of 
> type long and double as atomic (indivisible).

JLS 17.7. Non-Atomic Treatment of double and long:

> Writes and reads of volatile long and double values are always atomic… Writes 
> to and reads of references are always atomic…

And JMM has more of the same.

> and in that case the semantics is volatile + atomic operations (at least 
> CAS), so i think using atomic or any keyword derived from it will not help to 
> our users to understand the semantics for an inline class.

So volatile is associated with atomic, but it is not identical.
You came up with an interesting example of that association,
the AtomicLong API, but it’s just an association.  Nobody will be
confused.

> As Doug said, for a VarHandle, the semantics we want is called opaque, so 
> "opaque" is ok for me. 

Opaque has meaning only in the JMM (not for the general public)
and in that document the term atomic is also more correct.

> Otherwise, the idea is that you can not cut the loads/stores, so 
> "indivisible" is also ok.

The JVMS uses “indivisible” (see above) to amplify the term “atomic”,
but the primary term is “atomic”.

— John

Re: atomicity for value types

2020-01-14 Thread John Rose

On Dec 18, 2019, at 5:46 PM, John Rose  wrote:
> 
> - Define a contextual keyword “alwaysatomic" (working title “__AlwaysAtomic”).

I just referred more carefully to the draft JEP on keyword
management https://openjdk.java.net/jeps/8223002 and
realize that it guides us toward a so-called “hyphenated
contextual keyword” in preference to a “unitary contextual
keyword”.  That is, “always-atomic” is more in keeping with
that document than “alwaysatomic”.

I do think this looks more jarring than the unitary keyword:

always-atomic inline class Cursor { … }

But, that’s only because it’s an early hyphenated keyword,
which nobody is eye-trained on yet.  If we believe that
hyphenated keywords are the wave of the future, as I do,
then we should embrace it, rather than the old-school
unitary token “alwaysatomic”.

In the prototype I’m using a temporary (unitary contextual) keyword
__AlwaysAtomic, plus a temporary annotation @__alwaysatomic__.

In the JVMs the draft of the corresponding modifier bit looks like this:

ACC_ALWAYSATOMIC0x0040  Instances of this inline type are never torn.

Re: atomicity for value types

2020-01-14 Thread John Rose

On Jan 14, 2020, at 9:11 AM, Doug Lea  wrote:
> 
> On 1/13/20 4:44 PM, Tobi Ajila wrote:
>> Hi John
>> 
>> Given that inline types can be flattened there is a possibility that
>> data races will occur in places where users were not expecting it
>> before. So your `__AlwaysAtomic` modifier is a necessary tool as the
>> existing spec will only enforce atomicity for 32bit primitives and
>> references. I just want to confirm if the intention of the
>> __AlwaysAtomic bit on an inline class is only to ensure atomic reads and
>> writes of inline types and that there are no happens-before ordering
>> expectations as there are with the existing volatile modifier on fields.
>> 
> 
> In which case "__AlwaysOpaque" would be a more accurate term.

Very interesting!  I guess this is the most relevant definition of opaque:
  http://gee.cs.oswego.edu/dl/html/j9mm.html#opaquesec

Doug, in honor of one of your pet expressions, I would have
preferred to spell this keyword “not-too-tearable”, but
that’s truly an opaque phrase.

OK, so the above document defines a nice linear scale of four
memory access modes, ordered from weak to strong: Plain,
Opaque, Release/Acquire, and Volatile.  “Any guaranteed
property of a weaker mode, plus more, holds for a stronger
mode.”

For a JIT writer, this means stronger modes will require
additional ordering constraints, in IR and/or as hardware
fence instructions, and perhaps also stronger memory
access instructions.  In the worst case (which we may
see with inline types) library calls may be required
to perform some accesses — plus the space overhead of
control variables for things like seq-locks or mutexes.

The effect of the Plain mode on atomicity is described here:
> Additionally, while Java Plain accesses to int, char, short, float, byte, and 
> reference types are primitively bitwise atomic, for the others, long, double, 
> as well as compound Value Types planned for future JDK releases, it is 
> possible for a racy read to return a value with some bits from a write by one 
> thread, and other bits from another, with unusable results.

Then, Opaque mode tightens up the behavior of Plain mode by
adding Bitwise Atomicity (what I want here), plus three more
guarantees: Per-variable antecedence acyclicity, Coherence,
and Progress.

The document then suggests that these three more guarantees
won’t inconvenience the JIT writer:
> Opaque mode does not directly impose any ordering constraints with respect to 
> other variables beyond Plain mode.

But I think there will might be inconveniences.  Our current prototype
doesn’t mess with STM or HTM, but just buffers every new value
(under always-atomic or volatile) into a freshly allocated heap node,
issues a Release fence, and publishes the node reference into the
relevant 64-bit variable.  The node reference itself is stored in Plain
(Relaxed) mode, not Opaque or Release mode, and subsequent loads
are also relaxed (no IR or HW fences).

What we are doing with this buffering trick is meeting the requirements
of atomicity by using the previously-specified mechanisms for safe
publication (of regular identity classes with final instance variables).
In order to use this trick correctly we need to ensure that the specified
behavior of the always-atomic store does not make additional requirements.

When I look at the HotSpot code, I find that, if I were to classify
loads and stores of always-atomic as always-Opaque, I would find
myself adding more IR constraints than if I simply use the trick of
buffering for safe publication.  Maybe HotSpot is doing some overkill
on Opaque mode (see notes below for evidence of that) but I can’t
help thinking that at least the requirement of Progress (for Opaque)
will require the loop optimizer to take special care with always-Opaque
variables that it would not have to take with merely always-atomic ones.

This is a round-about way of say, “really Opaque? Why not just atomic?”
If I take always-Opaque as the definition I can use a clearly defined
category in upcoming JMM revisions (good!) but OTOH I get knock
on requirements (slow-downs) from that same category (bad!).

It’s  not right to say, “but always-atomic values will *always* be
*slow* as well, so quit complaining about lost optimizations”.
That’s because the JVM will often pack small always-atomic values
into single memory units (64- or 128-bit, whatever the hardware
supports with native atomicity).  In such cases, Plain order has
a real performance benefit relative to Opaque order, yes?

So, in the end, I’d like to call it always-atomic, and leave Opaque
mode as an additional opt-in for these types.

— John

P.S. More background, FTR:

Our intention with always-atomic types is to guarantee a modest
extension of type safety, that combinations of field values which
appear from memory reads will never be different from combinations
that have been created by constructor code (or else they are the default
combination).  This appeal to constructors extends type

Re: Reference-default style

2019-12-20 Thread John Rose

On Dec 20, 2019, at 12:04 PM, Brian Goetz  wrote:
> 
> The other direction is plausible too (when `T extends InlineObject`), though 
> I don't have compelling examples of this in mind right now, so its possible 
> that this is only a one-way requirement.

You’ve already looked at this, but to spell out more details FTR:

Reversing arrows, as T.ref is useful as a return type, so T.inline is useful
an argument type.  T.ref as a return value means “might produce nulls”.
And conversely T.inline as an argument means “won’t accept nulls”.
Both are cases where the usual application of Postel’s Law needs an
explicit variance.  It’s really about nulls more than inlines per se.

That’s all very nice, but an erased generic which applies only to inlines
seems like a relatively useless generic.  (Could be failure of imagination,
though.)  And applying such a generic (with “T.inline” arguments) to
identity and reference types creates cognitive dissonance in the user
model.

I guess T.notnull would be more honest, as far as generics are concerned.
But adding that (hello T! T?) is a new job, with lots of knock-on complexity.

So, no compelling examples (reasonable cost, good benefit), as you said.

— John

Re: Superclasses for inline classes

2019-12-20 Thread John Rose

On Dec 20, 2019, at 7:59 AM, Brian Goetz  wrote:
> 
> Stepping back, the thing that frightened us away from this was the 
> combination of (a) not wanting to have a modifier on abstract classes to 
> indicate inline-friendly,

Perhaps that’s made better by inverting the sense of the “bit” (perhaps 
modifier), so that inline-hostile classes have the burden of marking 
themselves, and inheritance can help with that (but only if the bit is 
inverted).

But inline-friendly types still need to contrive declaratively-empty 
constructors, so it looks like there’s a lot of recompilation in our future.

> and (b) worrying that it was a lot of (brittle) work to structurally detect 
> inline-friendly abstract classes.  Dan has cut this knot by tying it to the 
> presence of a declaratively-empty constructor, which changes the story a lot.

This is a specific case of making abstract classes more like interfaces.  If 
you look carefully at interfaces, you can see that their flexibility derives, 
in large part, from the lack of constructors.  (This in turn implies the lack 
of instance fields, but in a secondary way: You can’t control such fields fully 
without a constructor.)  Having declaratively empty constructors in classes 
(abstract or not!) opens for them some of the same paths that interfaces enjoy.

This reasoning can go the other way, too, to make interfaces more like abstract 
classes.  As I’ve pointed out before, interfaces could be given explicit 
declaratively-empty constructors, which in turn could be given less-public 
access than the interfaces themselves.  This would provide the same level of 
subclass control for interfaces as for abstract classes, with an effect close 
to “sealed interfaces”, from a class-like primitive (access control on 
constructors).  In particular an interface could be sealed to its nest mates by 
marking its declaratively-empty constructor as private (and so on).

I’m not pointing this out to say we should done sealing differently; I love the 
way sealing turned out.  But it’s important to be aware of some underlying 
“class physics” at play here with both interfaces and abstracts.  As a VM guy I 
tend to see the “physics" this way as logical design constraints that bubble up 
from the VM, instead of starting with a desired psychology and working down 
through the chemistry (if you get my drift).

Over time I see interfaces becoming more like abstract classes (notably with 
default methods), and abstract classes returning the favor by growing 
declaratively-empty constructors.  This is not an accident.  I’m convinced that 
as we continue to pay attention to the “physics”, we will be better informed in 
our treatment of other aspects of types, including instance fields and identity.

> (I remain unconvinced that instance fields in inline-friendly abstract 
> classes could possibly be in balance, cost-benefit-wise, and 
> super-unconvinced about inlines extending non-abstract classes.)

My concern here is to point out the logical possibility of such things, not to 
advocate for them now.  Treating fields and non-abstract supers as corollaries, 
rather than than axioms, makes me more certain we have grasped the physical 
essentials of the problem.  This is a valuable design heuristic:  We know that 
if we can say “no” to features while still understanding how they could fit in 
the future, we have arrived at a more factored, more desirable design.  This is 
why I’m talking now, hypothetically, about fields and concretes.

(Which is more important, physics or psychology?  Neither and both, I suppose, 
but physics has this privilege:  If you build on an inconsistent or 
gratuitously complex logical foundation, your user experience will never ever 
be as smooth as it could be.)

> As John says, there are three potential states for an abstract type: 
> always-identity (true of traditional abstract classes), always-inline, and 
> identity-agnostic.  (A possibly way to capture these is by leaning on our new 
> friends IdentityObject and InlineObject; an always-inline abstract type 
> implements InlineObject, and always-identity abstract type implements 
> IdentityObject, and an agnostic one implements neither.)  It is also possible 
> we might prune away the always-inline flavor of abstract types, leaving us 
> with two: inline-friendly or identity-locked.

I think the always-inline flavor of abstract *classes* can be replaced, for
most use cases, with interfaces (with sealing + default methods - toString),
and later on with templates (as long as the polymorphism is parametric).
Many uses of an always-inline flavor of *interfaces* can be replaced by a
sealed interface, where all the permits are inlines.

Dan and Remi have pointed out some places where we might be confronted
with a demand for more, which means we should keep our eyes open.
I’m very happy to bid less, for now, and perhaps forever.

> From a JLS perspective, we could say an abstract type is inline-friendly iff:
>  - its

stronger deprecation

2019-12-20 Thread John Rose

(Splitting out a mini-topic.)

On Dec 20, 2019, at 7:59 AM, Brian Goetz  wrote:
> 
> … the terminally confusing getInteger(String).  Maybe some further 
> deprecation of static inheritance is warranted here

The translation strategy and JVM have a mechanism for totally
submerging such methods, so that they are no longer visible to
the source code; it’s ACC_SYNTHETIC.  A synthetic method
occupies a descriptor and is linkable and reflectable but cannot
be used from source code.

There is no syntax for defining such things in source code;
the compiler back end spits them out into class files.  But if
these noxious methods were to be deprecated to the point
of unusability *in source code*, yet still needed to be present
as linkage points for old classfiles, we could create a marking,
and a user model, for keeping them around.

We could define a modifier with the appropriate properties
and slap it on offenders like getInteger.

Here’s a PoC design FTR:

https://bugs.openjdk.java.net/browse/JDK-8236444

(Something like this might also be appropriate for non-deprecated
“back doors” like deserialization API points.)

— John

Re: Superclasses for inline classes

2019-12-19 Thread John Rose

On Dec 18, 2019, at 3:57 PM, Dan Smith  wrote:
> 
> [Expanding on and summarizing discussion about abstract superclasses from 
> today's meeting.]
> 
> -
> Motivation
> 
> There are some strong incentives for us to support inline classes that have 
> superclasses other than Object. Briefly, these include:
> 
> - Identity -> inline migration candidates (notably java.lang.Integer) often 
> extend abstract classes
> - A common refactoring may be to extend an existing class with a (possibly 
> private) inline implementation
> - Abstract classes are more expressive than interfaces
> - If we compile Foo.ref to an abstract class, we can better represent the 
> full API of an inline class using an abstract class

I’m glad we are cracking open this can of worms; I’ve always thought
that interfaces as inline supers were good enough but not necessarily
the whole story.  

(At the risk of instilling more terror, I’ll say that I think that an abstract
super to an inline could contribute non-static fields, in a way that is
meaningful, useful, and efficient.  The initialization of such inherited
fields would of course use withfield and would require special rules
to allow the initialization to occur in the subclass constructor/factory.
I suppose this is a huge feature, as Dan says later.  A similar effect will
be available from templates, with less special pleading.)

(Does it make sense to allow an abstract class to *also* be inline?
Maybe, although there is a serious question about its default value.
If a type is abstract its default value is probably required to be null.)

A useful organizing concept for abstract supers, relative to inlines,
is a pair of bits, not both true, “always-inlined” and “never-inlined”.
Object and interfaces have neither mark by default.  The super of
an identity class cannot be “always-inlined" and the super of an
inline class cannot be “never-inlined”.  Or, an identity (resp. inline)
class has the “always-inlined” (resp. “never-inlined”) bit set.  And
for every T <: U in the class hierarchy, if T is always-inlined, then
U must not be never-inlined, and vice versa.  Thus if U is marked
then every T <: U is forbidden to have the opposite mark.  Or,
even more simply, both bits are deemed to inherit down to all
subtypes, and no type may contain both marks.

I don’t know how to derive those bits from surface syntax.  A marker
interface for each is a first cut: AlwaysInlined, NeverInlined.  Marker
interfaces are kind of smelly.  These particular ones work a little better
than their complements (InlineFriendly, IdentityFriendly) because
they exclude options rather than include them.

(Could a *non-abstract* inline be a super of another inline?  No, I’d
like to draw the line there, because that leads to paradoxes with flattening,
or else makes the super non-flattenable in most uses, or violates a
substitutability rule.)

> To be clear, much of this has to do with migration, and I subscribe to a 
> fairly expansive view of how much we should permit and encourage migration. I 
> think most every project in the world has at least a few opportunities to use 
> inline classes. Our design should limit the friction necessary (e.g., 
> disruptive redesigns of type hierarchies) to integrate inline classes with 
> the existing body of code.

You have a point.  Migration is not a task but a way of life?

> We've considered, as an alternative, supporting transparent migration of 
> existing classes to interfaces. But this raises many difficult issues 
> surrounding source, binary, and behavioral compatibility. It would be nice 
> not to have to tackle those issues, nor introduce a lot of caveats into the 
> class -> interface migration story.

As I said earlier, for value types we have this recurring need to bend
interfaces to be more like abstract classes, or else allow abstract classes
to become more like interfaces.

> -
> Constraints
> 
> Inline class instantiation is is fundamentally different from identity class 
> instantiation. While the language seeks to smooth over these differences, 
> under the hood all inline objects come from 'defaultvalue' and 'withfield' 
> invocations. There is no opportunity in these bytecodes for a superclass to 
> execute initialization code.

In the case we are discussing, the interface-like trick that abstract
classes need to learn is to have (declaratively) empty constructors.

I think that if a class (abstract or not) has a non-empty constructor,
it must also be given the “never-inline” mark.  (This is one reason
that mark isn’t simply a marker interface.)  In this way (or some
equivalent) a class with a non-empty constructor will never attempt
to be the super of an inline.

> (Could we redesign the construction model to properly delegate to a 
> superclass? Sure, but that's a huge new feature that probably isn't justified 
> by the use cases.)

Probably not.  Unless folks demand to factor fields as well as behaviors
into abstract supers of inlines.

> As a

atomicity for value types

2019-12-18 Thread John Rose

In a nutshell, here’s a proposal for value type atomicity:

- Define a contextual keyword “alwaysatomic" (working title “__AlwaysAtomic”).
- It can only occur with “inline” in class declaration.
- All instances of the given inline class are protected against races.
- Protection is “as if” each field were volatile, “and the same” for array 
elements.
- In the class file the ACC_VOLATILE bit is used (0x0040) to record the keyword.
- This bit does not necessarily disable flattening; if the JVM can get away 
with flattening it should.
- The JVM can get away with flattening such values on stack, in registers, and 
perhaps in final heap variables.
- The JVM can get away with flattening such values if they are “naturally 
atomic”, meaning they can be wholly loaded or stored in one (atomic) hardware 
instruction.

More details to follow.  Here’s a backgrounder I wrote a while ago:

http://cr.openjdk.java.net/~jrose/oblog/value-tearing.html

— John

Valhalla EG 20191204

2019-12-04 Thread John Rose

Present: John R., Tobi A., Dan H., Remi F., Fred P.
(Permission slip for Simms, who had a school meeting.)
(Brian is off working on his eclair document…)

agenda: discussion of eclairs, invoke modes (virtual vs. interface)
 ref-object vs. val-object (top types for inlines and refs)
 NOT REACHED: templates, java.lang.Class vs. “crass"

Remi: auto-unboxing is the essential feature of eclairs
 => interface can be empty, except for supertypes
Dan: enforce sealing in VM? John: just a translation strategy hack, maybe
(Fred) VMAC can't have a sealed super, since the VMAC can't be named!
migration of java.util.Optional:  auto-bridging? invokevirtual -> interface?
 Dan: what rules/restrictions?  Remi: see if it can be done with all interfaces
Dan: one CP entry needs to potentially support all invocation modes (even 
errors)
 lots of corner cases in state transitions of resolution and selection
 John: seems to require every methodref CP entry to support all invocation insns
 Remi: can have a list of migrated interfaces and special-case those? (Dan: 
ugly)
 if you have both invokeinterface and invokespecial you need three words!
John: MH-based linkage to handle invokevirtual -> invokeinterface
 John: wrap a Method* metadata pointer around a MethodHandle managed pointer?
 Dan: J9 allocates method wrappers contiguously, but maybe doable
 more bang for the buck to do autobridging!
 Dan: we had a list of use cases, still up to date?
AI: float a loose proposal
(Dan: looking at replacing with J9 MH impl Lambda Forms;
 dual impls. make it harder to do decompilation for deoptimization and debug)
refobj vs. valobj?
Remi: you only need one; Dan: hard to do generics over negative types
 can have compile-time ref-object type which erases to Object
John: java.lang.Record for inlines?
 Remi: Record should be interface, with special permisison to implement toString
default methods cannot abstract Object methods, and cannot define finals
 example: final toString method on lambda
 example: JUnit5 can write, parameter of test is a factory, factory uses 
lambdas,
 => printed report has stupid names for lambdas — ouch
 maybe "fat" serializable lambda should have a useful toString method? 
(“fatten” the interface with toString?)

Re: and factories

2019-10-18 Thread John Rose

On Oct 17, 2019, at 1:38 PM, Brian Goetz  wrote:
> 
> 
>> Fine points in the VM prototype:
>> 
>> Would there be any restrictions on the contents of a constructor/factory 
>> method ?  (I hope not.)
> 
> I'd be sad if it were possible for a invocation of a `` method to leave 
> a `null` on the stack.

Yes.  And should a factory contract sometimes include a guarantee of an exact 
type for the non-null return value?
(Maybe yes, sometimes no.  Probably null is always wrong; don’t call that a 
factory.)

So this leads to one or two use cases for type operators:

1. Non-null decoration on descriptor.  Could be a template specialization 
NonNull<*C> where
C is the return value and the thing with * is reified.  All factories should 
return this.

(Could be LC//NonNull; or LNonNull//C; or LC[NonNull]; or LNonNull[C]; as a 
decoration
syntax for descriptors.  Various other considerations would determine the 
actual bike shed color.)

2. Exact-type decoration on descriptor.  Could be another template 
specialization Exact<*C>.
Exact types are sometimes nice to have, although they make API points very 
rigid. Sometimes
that’s the goal.

— John

Re: and factories

2019-10-17 Thread John Rose

On Oct 17, 2019, at 11:22 AM, Dan Smith  wrote:
> 
> The plan of record for compiling the constructors of inline classes is to 
> generate static methods named "" with an appropriate return type, and 
> invoke them with 'invokestatic'.
> 
> This requires relaxing the existing restrictions on method names and 
> references. Historically, the special names "" and "" have been 
> reserved for special-purpose JVM rules (for example, 'invokespecial' is 
> treated like a distinct instruction if it invokes a method named ''); 
> for convenience, we've also prohibited all other method names that include 
> the characters '<' or '>' (JVMS 4.2.2).
> 
> Equivalently, we might say that, within the space of method names, we've 
> carved out a reserved space for special purposes: any names that include '<' 
> or '>'.
> 
> A few months ago, I put together a tentative specification that effectively 
> cedes a chunk of the reserved space for general usage [1]. The names "" 
> and "" are no longer reserved, *unless* they're paired with 
> descriptors of a certain form ("(.*)V" and "()V", respectively). Pulling on 
> the thread, we could even wonder whether the JVM should have a reserved space 
> at all—why can't I name my method "bob>" or "", for example?
> 
> In retrospect, I'm not sure this direction is such a good idea. There is 
> value in having well-known names that instantly indicate important 
> properties, without having more complex tests. (Complex tests are likely to 
> be a source of bugs and security exploits.) Since the JVM ecosystem is 
> already accustomed to the existence of a reserved space for special method 
> names, we can keep that space for free, while it's potentially costly to give 
> it up.
> 
> So here's a alternative design:
> 
> - "" continues to indicate instance initialization methods; "" 
> continues to indicate class initialization methods
> 
> - A new reserved name, "", say, can be used to declare factories
> 
> - To avoid misleading declarations, methods named "" must be static and 
> have a return type that matches their declaring class; only 'invokestatic' 
> instructions can reference them
> 
> - The rest of the "<.*>" space of names (plus ".*<.*" and ".*>.*") is held in 
> reserve, available for special purposes as we discover them
> 
> The Java compiler would only use "" methods for inline class 
> construction, for now; perhaps in the future we'll find other use cases that 
> make sense (like surfacing some sort of factory mechanism).
> 
> Does this seem promising? Any particular reason it's better to overload 
> "" than just come up with a new special name?

For my part either outcome is fine.  The prototype overloads  but it 
could almost as well have added .

Fine points in the VM prototype:

- A method  must be static, and it can be restricted to return exactly 
the type of its declaring class, except in “cases”.
- In some cases (VMACs and hidden classes) the declaring class is not denotable 
in a descriptor; the return type must be a super (maybe always Object).

So the prototype allows Object as a return type from a static  function.  
I don’t remember whether it checks that the declaring class is a VMAC in that 
case.

Would there be any restrictions on the contents of a constructor/factory method 
?  (I hope not.)

Would there be any enhancements to the capabilities of a  function?

For example, I think we should consider allowing  to invokespecial 
super. on a new instance, and/or putstatic into the final fields of the 
new instance.
If don’t allow this, then translation strategies may have to spin private 
 methods to handle the super call and final field inits, which seems 
suboptimal to me.
(To be clear:  I’m thinking of using  here in a non-inline class.)

One result of using a different name () is that there’s no need to require 
that it be static or not.
I don’t think there’s any benefit to requiring that  be static.  (Well 
maybe some:  It partitions  from
any kind of virtual call.)  Maybe a non-static  could serve as a factory 
method which takes the current
instance and “reconstructs” it as a new instance.  But that can be done by 
wrapping a static  into some
other method m, and then there’s no confusion about making m virtual.

> [1] 
> http://cr.openjdk.java.net/~dlsmith/lw2/lw2-20190628/specs/init-methods-jvms.html

Using something like  is a forced move for inline classes.  It is also 
(IMO) a fruitful move for
regular non-inline (“identity”) classes.  If the translation strategy were 
adjusted to translate every
new Foo() expression as invokestatic , the following benefits would appear:

- Less reliance on the verifier to validate arbitrary-in-the-wild 
“new/dup/invokespecial” code shapes.  (It’s been buggy in the past.)
- Simpler more optimizable bytecode for complex expressions like new A(…new 
B()…), currently a pain point in our JITs.
- A more direct path for migrating “new VT()” expressions from VT as a 
value-based class to an inline class.  (No

1 2 3 >

1 - 100 of 243 matches

Mail list logo