Brian pointed out that my list of candidate inline classes in the Identity 
Warnings JEP (JDK-8249100) includes a number of classes that, despite being 
"value-based classes" and disavowing their identity, might not end up as inline 
classes. The problem? Default values.

This might be a good time to revisit the open design issues surrounding default 
values and see if we can make some progress.

Background/status quo: every inline class has a default instance, which 
provides the initial value of fields and array components that have the inline 
type (e.g., in 'new Point[10]'). It's also the prototype instance used to 
create all other instances (start with 'vdefault', then apply 'withfield' as 
needed). The default value is, by fiat, the class instance produced by setting 
all fields to *their* default values. Often, but not always, this means 
field/array initialization amounts to setting all the bits to 0. Importantly, 
no user code is involved in creating a default instance.

Real code is always useful for grounding design discussions, so let's start 
there. Among the classes I listed as inline class candidates, we can put them 
in three buckets:

Bucket #1: Have a reasonable default, as declared.
- wrapper classes (the primitive zeros)
- Optional & friends (empty)
- From java.time: Instant (start of 1970-01-01), LocalTime (midnight), Duration 
(0s), Period (0d), Year (1 BC, if that's acceptable)

Bucket #2: Could have a reasonable default after re-interpreting fields.
- From java.time: LocalDate, YearMonth, MonthDay, LocalDateTime, ZonedDateTime, 
OffsetTime, OffsetDateTime, ZoneOffset, ZoneRegion, MinguoDate, HijrahDate, 
JapaneseDate, ThaiBuddhistDate (months and days should be nonzero; null 
Strings, ZoneIds, HijrahChronologies, and JapaneseEras require special handling)
- ListN, SetN, MapN (null array interpreted as empty)

Bucket #3: No good default.
- Runtime.Version (need a non-null List<Integer>)
- ProcessHandleImpl (need a valid process ID)
- List12, Set12, Map1 (need a non-null value)
- All ConstantDesc implementations (need real class & method names, etc.)

There's some subjectivity between the 2nd and 3rd buckets, but the idea behind 
the 2nd is that, with some translation layer between physical fields and 
interpretation of those fields, we can come up with an intuitive default (e.g., 
"0 means January"; "a null String means time zone 'UTC'"). In contrast, in the 
third bucket, any attempt to define a default value is going to be pretty 
unintuitive ("A null method name means 'toString'").

The question here is how much work the JVM and language are willing to do, or 
how much work we're willing to ask clients to do, in order to support use cases 
that don't fall into Bucket #1.

I don't think totally excluding Buckets #2 and #3 is a very good outcome. It 
means that, in many cases, inline classes need to be built up exclusively from 
primitives or other inline types, because if you use reference types, your 
default value will have a null field. (Sometimes, as in Optional, null fields 
have straightforward interpretations, but most of the time programs are 
designed to prevent them.)

Whether we support Bucket #2 but not Bucket #3 is a harder question. It 
wouldn't be so bad if none of the examples above in Bucket #3 become inline 
classes—for the most part they're handled via interfaces, anyway. 
(Counterpoint: inline class instances that are immediately typed with interface 
types still potentially provide a performance boost.) But I'm also not sure 
this is representative. We've noted before that many use cases, like database 
records or data structure cursors, don't have meaningful defaults (what's a 
default mailing address?). The ConstantDesc classes really illustrate this, 
even though they happen to not be public.

Another observation is that if we support Bucket #3 but not Bucket #2, that's 
probably not a big deal—I'm not sure anybody really *wants* to deal with the 
default instance; it's just the price you pay for being an inline class. If 
there's a way to opt out of that extra weirdness and move from Bucket #2 to 
Bucket #3, great.

With that discussion in mind, here are some summaries of approaches we've 
considered, or that I think we ought to consider, for supporting buckets #2 and 
#3. (This is as best as I recall. If there's something I've missed, add it to 
the list!)

[Weighing in for myself: my current preference is to do one of F, G, or I. I'm 
not that interested in supporting Bucket #2, for reasons given above, although 
Option A works for programmers who really want it.]



=== Solutions to support Bucket #2 ===

Two broad strategies here: re-interpreting fields (A, B), and re-interpreting 
the default instance (C, D).

---

Option A: Encourage programmers to re-interpret fields

Guidance to programmers: when you declare an inline class, identify any fields 
for which the default instance should hold something other than zero/null; 
define a mapping for your implementation from zero/null to the value you want.

One way to do this is to define a (possibly private) getter for each field, and 
include logic like 'return month + 1' or 'return id == null ? "UTC" : id'. Or 
maybe you inline that logic, as long as you're careful to do so everywhere. 
Importantly, you also need to reverse the logic in your constructor—for the 
sake of '==', if somebody manually creates the default instance, you should  
set fields to zero/null.

This doesn't work if you want public fields, but that's life as an OO 
programmer.

In this approach, it would be important that inline classes be expected to 
document their default instance in Javadoc (perhaps with a new Javadoc tag)—the 
interpretation of the default instance is less apparent to users than "all 
zeros".

Limitations:

- It's a fairly error-prone approach. Programmers will absolutely forget to 
apply the mapping in one place, and everything will be fine until somebody 
tries to invoke a particular method on the default instance. Put that bug in a 
security-sensitive context, and maybe you have an exploit. (Something that 
could help some is choosing good names—call your field 'monthIndex', not plain 
'month', to remind yourself that it's zero-based.)

- Performance impact of an extra layer of computation on all field accesses. 
Probably not a big deal in general, but all those null checks, etc., could have 
a negative impact in certain contexts. And the *appearance* of extra cost might 
scare programmers away from doing the right thing ("eh, I probably won't use 
the default value anyway, I'll just ignore it to make my code faster").

---

Option B: Language support for field re-interpretation

The language allows inline classes to declare fields with mappings to/from an 
internal representation. Just like Option A, but with guarantees that the 
internal representation isn't inappropriately accessed directly.

This pulls on a thread we explored a bit for Amber awhile back, some form of 
"abstract fields" or "virtual fields". Maybe there's something there, but it 
seems like a general-purpose feature, and one we're not likely to reach a final 
solution on anytime soon.

---

Option C: Language support for a designated default

The language provides some way for programmers to declare the "logical" default 
instance (something like a special static field). The compiler inserts a test 
for the "physical" default on any field/array access, and replaces it with the 
logical default.

That is:

Point p = points[3];

compiles to

point p$0 = points[3];
Point p = (p$0 == [vdefault Point]) ? Point.DEFAULT : p$0;

This is much less bug-prone than Option A—the compiler does all the work—and 
much more achievable in the short/medium term than Option B.

Compared to Option B, this pushes the computation overhead from inline class 
field accesses to reads of the inline type from fields/arrays. I don't know if 
that's good or bad—maybe a wash, heavily dependent on the use case.

A few big problems:

- The physical default still exists, and malicious bytecode can use it. If 
programmers want strong guarantees, they'll have to check and throw wherever an 
untrusted instance is provided. (Clients with access to the inline class's 
fields have to do so, too.)

- Covariant arrays mean every read from any array type that might be flattened 
(Object[], Runnable[], ConstantDesc[], ...) has to go through translation logic.

- There's an assumption here that the programmer doesn't intend to use the 
physical default as a valid non-default instance. That's hard for the compiler 
to enforce, and weird stuff happens in fields/arrays if the programmer doesn't 
prevent it. (Could be mitigated with extra implicit logic on field/array writes 
or in constructors.)

---

Option D: JVM support for a designated default

The VM allows inline classes to designate a logical default instance, and the 
field/array access instructions map from the physical default to the logical 
default. The 'vdefault' instruction produces the logical default instance; 
something else is used by the class's factories to build from the physical 
default.

This addresses the first two problems with Option C—the VM gives strong 
guarantees, and can make the translation a virtual operation of certain arrays.

To address the second problem, it seems like we'd need the more complex logic I 
hinted at: on writes, map the physical default to the logical default, and map 
the logical default to the physical default. Do the reverse on reads.

The problem here is bytecode complexity/slowdowns. We've already added some 
complexity to 'aaload'/'aastore' (covariant flattened arrays), and anticipate 
similar changes to 'putfield'/'getfield' (specialized fields), so maybe that 
means we might as well do more. Or maybe it means we're already over budget. :-)

From the users' perspective, if any performance reduction on reads/writes can 
be limited to the inline classes in Bucket #2, *all* the options have a similar 
cost, whether imposed by the programmer, language, or VM. So, to a first 
approximation, slower opcode execution is fine.



=== Solutions to support Bucket #3 ===

Two broad strategies here: rejecting member accesses on the default instance 
(E, F, G), and preventing programs from ever seeing the default instance (H, I).

--- 

Option E: Encourage programmers to guard against default instances

Guidance to programmers: if you don't like your class's default instance, check 
for it in your methods and throw. Maybe Java SE defines a new RuntimeException 
to encourage this.

The simple way to do this is with some boilerplate at the start of all your 
methods:

if (this == MyClass.default) throw new InvalidDefaultException();

More permissive classes could just do some validation on the fields that are 
relevant to a particular operation. (E.g., 'getMonth' doesn't care if 'zoneId' 
is null.)

This doesn't work if you want public fields, but that's life as an OO 
programmer.

It's not ideal that an invalid instance can float around a program until 
somebody trips on one of these checks, rather than detecting the invalid value 
earlier—we're propagating the NPE problem. And it takes some getting used to 
that there are two null-like values in the reference type's domain.

---

Option F: Language support for default instance guards

An inline class declaration can indicate that the default instance is invalid. 
The compiler generates guards, as in Option E, at the start of all instance 
method bodies, and perhaps on all field accesses outside of those methods.

Programmers give up finer-grained control, but get more safety. I'm sure most 
would be happy with that trade.

Improper/separately-compiled bytecode can skip the field access checks, but 
that's a minor concern.

Same issues as Option E regarding adding a "new NPE" to the platform.

---

Option G: JVM support for default instance guards

Inline class files can indicate that their default instance is invalid. All 
attempts to operate on that instance (via field/method accesses, other than 
'withfield') result in an exception.

This tightens up Option F, making it just as impossible to access members of 
the default instance as it is to access members of 'null'.

Same issues as Option E regarding adding a "new NPE" to the platform.

---

Option H: Language checks on field/array reads

An inline class declaration can indicate that the default instance is invalid. 
Every field and array access that may involved an uninitialized field/array 
component of that inline type gets augmented with a check that rejects reads of 
the default value (treating it as "you forgot to initialize this variable").

That is:

Point p = points[3];

compiles to

point p$0 = points[3];
if (p$0 == [vdefault Point]) throw new UninitializedVariableException();
Point p = p$0;

This is much like Option C, and has roughly the same advantages/problems. 
There's not a strong guarantee that the default value won't pop up from 
untrusted bytecode (or unreliable inline class authors), and lots of array 
types need guards.

---

Option I: JVM checks on field/array reads

Inline class files can indicate that their default instance is invalid. When 
reading from a field/array component of the inline type 
('getfield'/'getstatic'/'aaload'), an exception is thrown if the default value 
is found (treating it as "you forgot to initialize this variable"). The 
'vdefault' instruction, like 'withfield', is illegal outside of the inline 
class's nest.

Better than Option H in that it can be optimized to occur on only certain 
reads, and in that it provides strong guarantees—only the inline class can ever 
"see" the default instance.

Well, unless the inline class chooses to share that instance with the world. 
Not sure how we prevent that. But maybe at that point, anything bad/weird that 
happens is the author's own fault. (E.g., putting the default value in an array 
will make that component effectively "uninitialized" again.)

Like Option D, there's a question of whether we're willing to add this 
complexity to the 'getifled'/'getstatic'/'aaload' instructions. My sense is 
that at least it's less complexity than you have in Option D.

Reply via email to