Evolving instance creation

Dan Smith Tue, 22 Feb 2022 13:17:17 -0800

One of the longstanding properties of class instance creation expressions ('new 
Foo()') is that the instance being produced is unique—that is, not '==' to any 
previously-created instance.


Value classes will disrupt this invariant, because it's possible to "create" an 
instance of a value class that already exists:

new Point(1, 2) == new Point(1, 2) // always true

A related, possibly-overlapping new Java feature idea (not concretely proposed, 
but something the language might want in the future) is the declaration of 
canonical factory methods in a class, which intentionally *don't* promise 
unique instances (for example, they might implement interning). These factories 
would be like constructors in that they wouldn't have a unique method name, but 
otherwise would behave like ad hoc static factory methods—take some arguments, 
use them to create/locate an appropriate instance, return it.

I want to focus here on the usage of class instance creation expressions, and 
how to approach changes to their semantics. This involves balancing the needs 
of programmers who depend on the unique instance invariant with those who don't 
care and would prefer fewer knobs/less complexity.

Here are three approaches that I could imagine pursuing:

(1) Value classes are a special case for 'new Foo()'

This is the plan of record: the unique instance invariant continues to hold for 
'new Foo()' where Foo is an identity class, but if Foo is a value class, you 
might get an existing instance.

In bytecode, the translation of 'new Foo()' depends on the kind of class (as 
determined at compile time). Identity class creation continues to be 
implemented via 'new Foo; dup; invokespecial Foo.<init>()V'. Value class 
creation occurs via 'invokestatic Foo.<newvalue>()LFoo;' (method name 
bikeshedding tk). There is no compatibility between the two (e.g., if an 
identity class becomes a value class).

In a way, it shouldn't be surprising that a value class doesn't guarantee 
unique instances, because uniqueness is closely tied to identity. So 
special-casing 'new Foo()' isn't that different from special-casing 
Object.equals'—in the absence of identity, we'll do something reasonable, but 
not quite the same.

Factories don't enter into this story at all. If we end up having unnamed 
factories in the future, they will be declared and invoked with a separate 
syntax, and will be declarable both by identity classes and value classes. 
(Value class factories don't seem particularly compelling, but they could, say, 
be used to smooth migration, like 'Integer.valueOf'.)

Biggest concerns: for now, it can be surprising that 'new' doesn't always give 
you a unique instance. In a future with factories, navigating between the 'new' 
syntax and the factory invocation syntax may be burdensome, with style wars 
about which approach is better.

(2) 'new Foo()' as a general-purpose creation tool

In this approach, 'new Foo()' is the use-site syntax for *both* factory and 
constructor invocation. Factories and constructors live in the same overload 
resolution "namespace", and all will be considered by the use site.

In bytecode, the preferred translation of 'new Foo()' is 'invokestatic 
Foo.<new>()LFoo;'. Note that this is the case for both value classes *and 
identity classes*. For compatibility, 'new/dup/<init>' also needs to be 
supported for now; eventually, it might be deprecated. Refactoring between 
constructors and factories is generally compatible.

Because this re-interpretation of 'new Foo()' supports factories, there is no 
unique instance invariant. At best, particular classes can document that they 
produce unique instances, and clients who need this behavior should ensure 
they're working with classes that promise it. (It's not as simple as looking 
for a *current* factory, because constructors can be refactored to factories.)

For developers who don't care about unique instances, this is the simplest 
approach: whenever you want an instance of Foo, you say 'new Foo()'.

Biggest concerns: we've demoted an ironclad semantic guarantee to an optional 
property of some classes. For those developers/use cases who care about the 
unique instance invariant, that may be difficult, especially because we're 
undoing a longstanding property rather than designing it this way from the 
beginning.

(3) 'new Foo()' for unique instances and just 'Foo()' otherwise

Here, the 'new' keyword is reserved for cases in which a unique instance is 
guaranteed. For value class creation, factory invocation, and constructor 
invocation when unique instances don't matter, a bare 'Foo()' call is used 
instead. 'new Point()' would be an error—this syntax doesn't work with value 
classes.

In bytecode, 'new Foo()' always compiles to 'new/dup/<init>', while plain 
'Foo()' typically compiles to 'invokestatic Foo.<make>()LFoo;' (method name 
bikeshedding tk). For compatibility, plain 'Foo()' would support 
'new/dup/<init>' invocations as well, if that's all the class provides. 
Refactoring between constructors and factories is generally compatible for 
plain 'Foo()' use sites, but not 'new Foo()' use sites.

The plain 'Foo()' would become the preferred style for general-purpose usage, 
while 'new Foo()' would (eventually, after a long migration period) signal an 
interest in the unique instance guarantee. Java code written with the updated 
style is a little lighter on "ceremony".

Biggest concerns: a somewhat arbitrary shift in coding style for all 
programmers to learn, which at a minimum must be adopted when working with 
value classes.

---

What are your thoughts about the significance of the unique instance invariant? 
Is it important enough to design instance creation syntax around it? Do either 
(2) or (3) above sound like a better destination than the plan of record?

Evolving instance creation

Reply via email to