On May 11, 2018, at 7:39 AM, Frederic Parain <[email protected]> wrote: > > John, > > I have a question about the semantic within legacy class files (class > files lacking a ValueTypes attribute). Your document clarifies the semantic > for fields as follow: > > "Meanwhile, C is allowed to putfield and getfield null all day long into its > own fields (and fields of other benighted legacy classes that it may be > friends with). Thus, the getfield and putfield instructions link to slightly > different behavior, not only based on the format of the field, but also based > on “who’s asking”. Code in C is allowed to witness nulls in its Q fields, but > code in A (upgraded) is not allowed to see them, even though it’s the same > getfield to the same symbolic reference. Happily, fields are not shared > widely across uncoordinated classfiles, so this is a corner case mainly for > testers to worry about.” > > But what’s about arrays? If I follow the same logic that “old code needs to > be left undisturbed if possible”, if a legacy class C creates an array of Q, > without knowing that Q is now a value type, C would expect to be allowed > to write and read null from this array, as it does from its own fields. Is it > a > correct assumption?
Yes, I avoided this question in the write-up. To apply the same move as fields, we could try to say that arrays of type V[] created by a legacy class C do not reject nulls, while arrays of type V[] created by normal classes (that recognize V as value types) are created as flattened. But the analogy between fields and array elements doesn't work in this case. While a class C can only define fields in itself, by creating arrays it is working with a common global type. Think of V[] as a global type, and you'll see that it needs a global definition of what is flattened and what is nullable. I think we will get away with migrating types and declaring that legacy classes that use their arrays will fail. The mode of failure needs engineering via experiment. We could go so far as to reject legacy classes that use anewarray to build arrays of value type, without putting those types on the ValueTypes list. This means that if there is a current class C out there that is creating arrays of type Optional[] or type LocalDate[], then if one of those types is migrated to a value type, then C becomes a legacy class, and it will probably fail to operate correctly. OTOH, since those classes use factories to create non-null values of type Optional or LocalDate, such a legacy class is likely to refrain from using nulls. I think it's possible but not likely that the author of a legacy class will make some clever use of nulls, storing them into an array of upgraded type V. In the end, some legacy code will not port forward without recompilation and even recoding. Let's do what we can to make it easier to diagnose and upgrade such code, as long as it doesn't hurt the basic requirement of making values flattenable. The idea of making fields nullable seems a reasonable low-cost compromise, but making elements nullable a much higher cost. Any need for a boxy or nullable array is more easily served by an explicit reference array, of type Object[] or ValueRef<VT>[]. Overloading that behavior into V[] is asking for long-term trouble with performance surprises. Erased Object or interface arrays will fill this gap just as well as a first-class nullable VT.BOX[], with few exceptions. I think those exceptions are manageable by other means than complicating (un-flattening) the basic data types of the VM. > This would mean that the JVM would have to make the distinction between > an array of nullable elements, and an array of non-nullable elements. We could try this, but let's prove that it's worth the trouble before pulling on that string. I'm proposing Object[] and ValueRef<V>[] as workaround types. > Which > could be a good thing if we want to catch leaking of arrays with potentially > null elements from old code to new code, instead of waiting for new code > to access a null element to throw an exception. Why not try to catch the problem when the array is created? Have the anewarray instruction do a cross check (like CLCs) between the base type of the array and local ValueTypes. > In the other hand, the lazy > check solution allows arrays of non-nullable elements with zero null elements > to work fine with new code. So, we have discussed the alternative of adding extra polymorphism to all value array types: Some arrays are flat and reject nulls, while others are boxy and accept nulls. But here again I want to push back against inventing a parallel set of boxy implementations, because it's a long term systemic cost for a short term marginal gain. Besides, some library classes don't use native anewarray but use jlr.Array.newInstance to make arrays. Do we make that guy caller-sensitive so he can tell which kind of array to make? I think this is a long string to pull on. It's easier to define something as "clearly in error" (see above) than to try to fix it on the fly, because you probably have to fix more and more stuff, and keep track of the fixes. Like I say, long term cost for marginal migration improvements. > From an implementation point of view, the JVM already has to make the > distinction between flattened and not flattened arrays, so there’s a logic > in place to detect some internal constraints of arrays, but the nullable/ > non-nullable element semantic would require one additional bit. We *can* do this, but we shouldn't because (a) it's a long string to pull on a user model that is ultimately disappointing, and (b) it means that even optimized code, twenty years from now, will have to deal with this extra polymorphism. — John
