On Jun 1, 2023, at 4:10 PM, Kevin Bourrillion <[email protected]> wrote:

I'm wondering why we shouldn't require fields of non-nullable value-class types 
to be explicitly initialized. `Complex x = new Complex(0, 0)` or `Complex x = 
new Complex()`. I'll stipulate "people would grumble" as self-evident.

Just catching up on this...

How I read this is that you can imagine (and would prefer) an alternative Java 
language design in which primitive-typed variables (int/boolean/double/etc.) 
are always required to be explicitly initialized before use. So you object to 
Brian's framing of "uninitialized use" as a good thing.

This isn't crazy, because it's precisely the approach we take to 
primitive-typed local variables. It's only fields and arrays that get 
implicitly initialized.

If we set aside historical baggage and expectations, I think the biggest 
problem with this alternative language is that it's hard to enforce in 
bytecode. We could handle it in one of two ways:

A) Dynamically check & fail if a primitive variable hasn't been written yet. 
That would require an extra indirection or "uninitialized" metadata flag. 
Either way, this is equivalent to finding an encoding for a nullable primitive 
type, which intolerably increases footprint. So that's out.

B) Statically guarantee (not just *encourage*) primitive variables are written 
before use. Three challenges here, in increasing order of difficulty:

    i) Arrays must be initialized on creation. The instruction set makes this 
difficult to prove, so we'd probably handle this with a trusted API that either 
has a native implementation or gets special permission for its bytecode to 
violate this rule. Ideally, this API would be optimized for comparable 
performance to today's newarray, at least when the initial value is 0. One 
future problem here is that it doesn't generalize: generic algorithms don't 
know what initial value to use, so need to be parameterized by something to use 
as default, leading to API distortions. (Not an issue in today's Java, but will 
become one in the future.)

    ii) Instance fields must be written by <init> methods before publishing 
'this'. If the class allows subclasses, then that means <init> methods cannot 
publish 'this' at all. Presumably this implies no method calls involving 
'this', because we're not going to want to track what happens inside the 
method. Keep in mind that <init> is ad hoc imperative code—there's no such 
thing as a "field initializer" in bytecode. Also keep in mind that one of the 
most difficult/unpopular features of bytecode verification is that way it tries 
to track the initialization state of partially-constructed objects.

    iii) Static fields must be written by <clinit> methods before anyone tries 
to access them. The class initialization protocol makes this effectively 
impossible—the first time a not-yet-loaded class gets mentioned in the body of 
<clinit>, a class loader starts running user code, and there's no guarantee 
that from there somebody won't call back to look at a not-yet-initialized field.

Neither (A) nor (B) seem particularly viable, which leads to the default value 
solution. Now, we could quibble about how strongly the language should 
discourage reads-before-writes of fields and arrays, but the point is that 
there has to be *something* explaining what you see in the corner case in which 
you read before initialization.

Similarly, if we want to allow value classes to avoid the footprint overhead of 
(B), we need (some of) them to support default values.

All that said, I do think there's room for disagreement about whether a 
read-before-write scenario should be considered normal behavior (Brian's 
presentation) or a program bug that we unfortunately can't always prevent. One 
argument for de-emphasizing the behavior of races and the opt-in to tearing is 
that these behaviors, though they must be specified, are firmly in the "program 
bug" category.

Reply via email to