Re: What's in a CONSTANT_Class?

2017-06-14 Thread John Rose
On Jun 14, 2017, at 8:54 AM, Karen Kinnear  wrote:
> 
> We would like to request that for the MVT Early Access we keep the TEMPORARY 
> CONSTANT_Class_info “;Q”.

Nit: For uniformity, the syntax wants to be ";" + field_signature,
which implies ";Q;".  Without that uniformity you need
to specify a third syntax (neither field nor method signature),
which is not good spec. economy, even for a temporary feature.



Re: What's in a CONSTANT_Class?

2017-06-14 Thread John Rose
On Jun 14, 2017, at 9:22 AM, Remi Forax  wrote:
> 
> With my ASM Hat,
> both CONSTANT_Class_info “;Q” and CONSTANT_ValueType_info that 
> references an UTF8 are Ok for me.

Between those two I prefer the first since it doesn't require a new CP tag.

> Weirdly, having a CONSTANT_Value_info that reference a CONSTANT_Class_info is 
> little harder to implement because the implementation of ASM is sensitive to 
> the number of levels of indirection (it's hardcoded to be 4, a constant 
> method handle has 4 levels).

Interesting fact.  Won't that have to change with condy?
That allows bootstrap specifications to be recursive.

> On the longer term, I think that the spec of CONSTANT_Class should changed to 
> accept a class descriptor and not a class name (which is not BTW because 
> array are accepted in order to encode a method call to an array clone()).
> It will allow more sharing and unlike a class name, a class descriptor is an 
> extensible format.

[Flat strings won't take us there]

Remi, flat strings don't go far enough.  They are moderately
extensible, and certainly accommodate new ground types like
QFoo; and UFoo;, but there are two big problems.  First, they
suffer from combinatorial explosion (*less* sharingin flat strings)
and second they incompletely support expression-holes which
are required when we get to generics.

We live with the combinatorial problems of method type descriptors,
but I think that's a place we want to retreat from. (Look at the encoding
of (Object,Object,Object)Object:  The flatness requires repetition of
the whole qualified name four times, just in this one descriptor.)

When we go to parameterized types, ground types will have multiple
levels of nesting, which turns the problem from quadratic to
exponential.  That that point it's more than today's irritant.

You can patch this with repeat operators, but the natural format
is a tree, which represents all subparts uniformly, rather than some
as a defining use, and others as repeated uses.

[String-tagged shallow trees]

For non-ground generic types, a type string could to be something
like a format string.  (The format "hello, %s" has a string-typed hole.)
In that case, the string doesn't give you everything you need;
it must be joined by a vector of operands.  At that point you've
invented trees, and then the real question is whether tree nodes
should be tagged by format strings (an infinite number of them)
or by a handful of simple CP-style tags.

I handled both these issues in Pack200 by with the CONSTANT_Signature
CP type (present only in Pack200 archives), whose content is a format
string (with N>=0 holes) plus an implicitly counted vector of (N) CP refs
of type CONSTANT_Class.  (Primitives are inlined.)  For technical
reasons the hole syntax, if any, must be different from either string
format notations and Pack200 with future JVMs; I think it should be
a simple period '.'.  (For discussion signature meta-characters see
my "Symbolic Freedom" manifesto ca. 2008.)

For values+generics we'll probably want to look at an experimental design
like this that uses string-tagged tree nodes.  They are very compact (hence
their use in Pack200).

[Byte-tagged deep trees]

But I think for ease of tooling we will end up with the other option,
which is *more* tree nodes tagged by a very small finite set of
CP-style tags.  This is why I support designs like the ones
Dan has been sketching.

In that style of tree, a format string like "hello, %s" breaks down into
nested AST (Append[Literal["hello, "],Param[]]).  Instead of parsing
the string to find holes, the holes are directly represented, along
with every other part, in a strongly-typed AST tree.

An advantage of Dan-style trees is they are more strongly normalizing.
With the format-based trees you always have small types sliding inline
into the format strings, or out as explicit nodes (for uses like ldc).
The programmer's educated instincts prefer one way to say one
thing, rather than many ways to say the same thing.  Stronger
normalization leads to better compactness and fewer bugs.

[Constant inlining?]

Dan-style trees *could* be made much more compact, comparable
to format strings, by extending the CP to support inlining of constant
expressions into other expressions.  This weakens the strong normalization
of constants, but at a lower level where it can be hidden; constants
presented via tools like ASM can be normalized easily, with a single
clever rule ("unwind the inlining by making temporary CP nodes").
ASM does stuff like this in reverse already, by interning ("normalizing")
constants.

We probably need something like this anyway, for the future
CONSTANT_Group syntax, which doesn't pay for itself if it has to
burn its way through the limited (u2) index space of the CP; so it
needs some form of inlining, for constants that occur only inside
the group and don't need global sharing.

> From the VM point of view, it's easy to know if a CONSTANT_Class is a 
> descriptor or not, i

Re: What's in a CONSTANT_Class?

2017-06-14 Thread Remi Forax
Hi Karen,
With my ASM Hat,
both CONSTANT_Class_info “;Q” and CONSTANT_ValueType_info that references 
an UTF8 are Ok for me.

Weirdly, having a CONSTANT_Value_info that reference a CONSTANT_Class_info is 
little harder to implement because the implementation of ASM is sensitive to 
the number of levels of indirection (it's hardcoded to be 4, a constant method 
handle has 4 levels).

On the longer term, I think that the spec of CONSTANT_Class should changed to 
accept a class descriptor and not a class name (which is not BTW because array 
are accepted in order to encode a method call to an array clone()).
It will allow more sharing and unlike a class name, a class descriptor is an 
extensible format.

>From the VM point of view, it's easy to know if a CONSTANT_Class is a 
>descriptor or not, if it's a descriptor, the last character is a ';'.
I also think that the bytecode version corresponding to 10 should requires that 
all CONSTANT_Class are encoded as class descriptor.  

regards,
Rémi

- Mail original -
> De: "Karen Kinnear" 
> À: "Dan Smith" 
> Cc: valhalla-spec-experts@openjdk.java.net
> Envoyé: Mercredi 14 Juin 2017 17:54:07
> Objet: Re: What's in a CONSTANT_Class?

> Update from hotspot implementation:
> 
> We would like to request that for the MVT Early Access we keep the TEMPORARY
> CONSTANT_Class_info “;Q”.
> 
> This is far easier for us to implement (we have a prototype in progress) and 
> we
> believe that it will be easier
> for bytecode generators to adopt - which will allow us to get more people 
> trying
> MVT so we get more feedback.
> 
> We would also like to keep the explicit separate name for the derived value
> class, so that from an implementation
> standpoint we are able to continue to use the name, class loader pair as a
> unique lookup.
> So the JVMS as proposed explicitly calls out 5.3 Creation and Loading that the
> derived value class has the name ClassName$Value.
> 
> For Early Access we would like to keep this naming convention, stable across
> reboots, so people can generate byte codes
> that reference value types by name distinctly from their value capable class.
> 
> thanks,
> Karen
> 
> p.s. this will allow us time to do the longer-term exploration of where the
> class/type/constant pool forms should evolve


Re: What's in a CONSTANT_Class?

2017-06-14 Thread Karen Kinnear
Update from hotspot implementation:

We would like to request that for the MVT Early Access we keep the TEMPORARY 
CONSTANT_Class_info “;Q”.

This is far easier for us to implement (we have a prototype in progress) and we 
believe that it will be easier
for bytecode generators to adopt - which will allow us to get more people 
trying MVT so we get more feedback.

We would also like to keep the explicit separate name for the derived value 
class, so that from an implementation
standpoint we are able to continue to use the name, class loader pair as a 
unique lookup.
So the JVMS as proposed explicitly calls out 5.3 Creation and Loading that the 
derived value class has the name ClassName$Value.

For Early Access we would like to keep this naming convention, stable across 
reboots, so people can generate byte codes
that reference value types by name distinctly from their value capable class.

thanks,
Karen

p.s. this will allow us time to do the longer-term exploration of where the 
class/type/constant pool forms should evolve

Re: Draft of spec for Minimal Value Types

2017-06-14 Thread Karen Kinnear
Dan,

Thank you for the responses. Summary - we are good with the current JVMS 
description of decoupling VCC and DVC initialization and linking
as long as you add vbox to require initialization of the VCC.


> On Jun 13, 2017, at 6:51 PM, Dan Smith  wrote:
> 
>> On Jun 13, 2017, at 3:26 PM, Karen Kinnear > > wrote:
>> 
>> I wanted to follow up specifically on the load/link/init relationships for 
>> the Value Capable Class (VCC) and the derived Value Class (DVC) to use the 
>> terms in this JVMS draft. (Note: direct value class is the longer term 
>> directly defined value class which I have been calling a Valhalla Value Type 
>> VVT)
> 
>> Detailed question:
>> In JVMS 5.5 Initialization in your draft - is it intentional that for 
>> anewarray and multianewarray that you mention a direct value class type
>>  - which in your terminology I believe is the future “valhalla value type” 
>> which is directly defined but not derived from a VCC. So that
>> you would not trigger initialization for these instructions for a derived 
>> value class?
>> Would that be the same for vdefault also then?
> 
> I think you're misunderstanding my use of "direct" -- I mean "non-reference" 
> (as opposed to "reference value", which is a pointer). A "value class" is a 
> class with ACC_VALUE set. The only way to get one of those, per current spec, 
> is by deriving it from a VCC, and, sure, "derived value class" is an 
> appropriate term. A "direct value class type" is the Q type of a value class, 
> whether that class is derived or otherwise (were "otherwise" a possibility).
My misunderstanding. Thank you for clearing up the terminology - I will make a 
new cheat sheet :-)
> 
>> I think we are all in agreement that a reference to a DVC must first 
>> pre-load the VCC just as it has to pre-load supertypes.
>> 
>> The question arises about linking and initialization.
>> 
>> So to clarify, the DVC does not have any methods, including  today, 
>> and does not have any statics.
>>   So linking of the DVC itself does nothing.
>>   So initialization of the DVC itself does nothing.
>> 
>> I think there are two models we would use here.
>> 
>> Option 1: super-type model for root class relative to derived class: 
>> pre-link and pre-init VCC when linking or initialization the DVC
>> 
>> Conceptually I think of a DVC and VCC as sharing one set of statics, and a 
>> value class instance today as a “copy” of the instance fields of a VCC 
>> instance. 
> 
>> Longer-term it is not clear if we will have a root class and a derived 
>> class, or conceptually one class file with two derived
>> classes, but I believe the expectation is that there will continue to be one 
>> set of statics, so I would expect the statics to need
>> to be initialized before either derived class created an instance.
>> 
>>   Longer-term it is expected that we will have a single source file 
>> with methods that must be verified before either class can
>> be used.
> 
> If there's one set of statics, I would say there is one class. This approach 
> seems consistent with a model in which we eliminate "Foo$Value" as a class 
> name and just have reference and value flavors of "Foo" instead. At that 
> point, I would say that we only have one thing to load, link, and initialize, 
> and resolution of Q types should trigger that just like resolution of L types.
There is one set of statics.

From an implementation standpoint - the hotspot JVM would like to keep the 
Foo$Value for Early Access since there would be too many changes internally to
handle the transition from name/class loader pair to the triple of 
name/mode/class loader.
We can explore this again after Early Access.
> 
>> Option 2: lazy initialization, lazy linking
>> 
>> Alternatively we could not initialize the VCC until any of the current 
>> instructions either reference a static or create an instance.
>> Note: I would expect vbox to be added to the instructions requiring 
>> initialization in this case since it creates an instance of the VCC
>> 
>> Even in this case I would continue to require initialization and linking 
>> according to the rules you state, e.g. adding initialization
>> based on vdefault, anewarray, multianewarray even if they do nothing other 
>> than a state change.
>> 
>> I do not know in this case how to handle verification errors in the VCC - 
>> i.e. are you still free to operate on the DVC?
>> What happens when you try to vbox?
> 
> I think this describes the approach I've tried to specify. You have to load 
> the VCC before defining the DVC, but otherwise we're talking about two 
> independent classes.
> 
> Good point, 'vbox' should be on the list of instructions that require 
> initialization of the VCC.
> 
> Linking of the VCC would be subject to the general rules for linking (5.8): 
> must happen sometime after loading, sometime before initialization. Errors 
> occur at a point in the program that "might, directly or indirectly, require 
>