On Jun 1, 2023, at 4:10 PM, Kevin Bourrillion <[email protected]> wrote:
I'm wondering why we shouldn't require fields of non-nullable value-class types
to be explicitly initialized. `Complex x = new Complex(0, 0)` or `Complex x =
new Complex()`. I'll stipulate "people would grumble" as self-evident.
Just catching up on this...
How I read this is that you can imagine (and would prefer) an alternative Java
language design in which primitive-typed variables (int/boolean/double/etc.)
are always required to be explicitly initialized before use. So you object to
Brian's framing of "uninitialized use" as a good thing.
This isn't crazy, because it's precisely the approach we take to
primitive-typed local variables. It's only fields and arrays that get
implicitly initialized.
If we set aside historical baggage and expectations, I think the biggest
problem with this alternative language is that it's hard to enforce in
bytecode. We could handle it in one of two ways:
A) Dynamically check & fail if a primitive variable hasn't been written yet.
That would require an extra indirection or "uninitialized" metadata flag.
Either way, this is equivalent to finding an encoding for a nullable primitive
type, which intolerably increases footprint. So that's out.
B) Statically guarantee (not just *encourage*) primitive variables are written
before use. Three challenges here, in increasing order of difficulty:
i) Arrays must be initialized on creation. The instruction set makes this
difficult to prove, so we'd probably handle this with a trusted API that either
has a native implementation or gets special permission for its bytecode to
violate this rule. Ideally, this API would be optimized for comparable
performance to today's newarray, at least when the initial value is 0. One
future problem here is that it doesn't generalize: generic algorithms don't
know what initial value to use, so need to be parameterized by something to use
as default, leading to API distortions. (Not an issue in today's Java, but will
become one in the future.)
ii) Instance fields must be written by <init> methods before publishing
'this'. If the class allows subclasses, then that means <init> methods cannot
publish 'this' at all. Presumably this implies no method calls involving
'this', because we're not going to want to track what happens inside the
method. Keep in mind that <init> is ad hoc imperative code—there's no such
thing as a "field initializer" in bytecode. Also keep in mind that one of the
most difficult/unpopular features of bytecode verification is that way it tries
to track the initialization state of partially-constructed objects.
iii) Static fields must be written by <clinit> methods before anyone tries
to access them. The class initialization protocol makes this effectively
impossible—the first time a not-yet-loaded class gets mentioned in the body of
<clinit>, a class loader starts running user code, and there's no guarantee
that from there somebody won't call back to look at a not-yet-initialized field.
Neither (A) nor (B) seem particularly viable, which leads to the default value
solution. Now, we could quibble about how strongly the language should
discourage reads-before-writes of fields and arrays, but the point is that
there has to be *something* explaining what you see in the corner case in which
you read before initialization.
Similarly, if we want to allow value classes to avoid the footprint overhead of
(B), we need (some of) them to support default values.
All that said, I do think there's room for disagreement about whether a
read-before-write scenario should be considered normal behavior (Brian's
presentation) or a program bug that we unfortunately can't always prevent. One
argument for de-emphasizing the behavior of races and the opt-in to tearing is
that these behaviors, though they must be specified, are firmly in the "program
bug" category.