cloud-fan commented on code in PR #56237:
URL: https://github.com/apache/spark/pull/56237#discussion_r3334492207
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala:
##########
@@ -95,6 +95,11 @@ object ExtractValue {
case (MapType(_, _, _), _) => Left(GetMapValue(child, extraction))
+ // Extracting from NULL should return NULL, not throw an error. This can
happen when a
+ // column has NullType (e.g. from schema evolution with missing
columns), and aligns with
+ // standard SQL semantics where any operation on NULL yields NULL.
+ case (NullType, _) => Left(Literal(null, NullType))
Review Comment:
This is a behavior change to the shared `ExtractValue` utility, but only the
user-facing SELECT path is tested.
`extractValue`/`isExtractable`/`ExtractValue.apply` are consumed by ~10 call
sites — legacy `ResolveReferences`/`TempResolvedColumn` (HAVING/ORDER BY),
single-pass `NameScope` candidate filtering, `ResolveUnion.mergeFields` (union
by name), `higherOrderFunctions`, `PivotTransformer`, `VariableResolution`,
`DeserializerBuildHelper`, `NestedColumnAliasing` — several of which previously
relied on the throw as a validation signal. The most consequential side effect:
`isExtractable` now returns true for NullType, so in `NameScope`
(NameScope.scala:761,826) a NullType column becomes a valid extraction
candidate, broadening the candidate set and potentially changing which
attribute a multipart name binds to (or introducing new ambiguity). Could you
justify that NULL-propagation rather than the prior error is correct for the
non-SELECT consumers, or scope the fix to the user
-facing resolution point?
Secondary note: this arm discards `extraction`, so for
`col[expr]`/`col[keyExpr]` the index/key expression is never evaluated — unlike
the sibling `GetArrayItem`/`GetMapValue` arms above. The result is NULL
regardless under null-propagation, so this is benign, but it diverges from the
siblings.
##########
sql/core/src/test/resources/sql-tests/inputs/extract-value-nulltype.sql:
##########
@@ -0,0 +1,16 @@
+-- Test ExtractValue behavior with NullType base expressions.
+
+-- Struct field access on a NullType column
+SELECT col.a FROM (SELECT null AS col) t;
+
+-- Array index on a NullType column
+SELECT col[0] FROM (SELECT null AS col) t;
+
+-- Map key access on a NullType column
+SELECT col['key'] FROM (SELECT null AS col) t;
+
+-- HAVING with field access on NullType grouped column
+SELECT NAMED_STRUCT('a', 1) AS col1
+FROM VALUES (NULL)
+GROUP BY col1
+HAVING col1.a == 1;
Review Comment:
This case returns an empty result, but not for the reason it looks like.
`VALUES (NULL)` produces an aggregate *input* column default-named `col1` of
NullType, which shadows the `NAMED_STRUCT('a', 1) AS col1` SELECT alias. In
HAVING, `col1` resolves against the input column (via `TempResolvedColumn`), so
`col1.a` = `extractValue(NullType, "a")` -> `Literal(null)` -> `cast(null as
int) = 1` -> false -> empty.
The existing `having-and-order-by-recursive-type-name-resolution.sql` is
dedicated to exactly this shadowing scenario and deliberately throws
`INVALID_EXTRACT_BASE_FIELD_TYPE` for STRING/ARRAY/MAP input bases (e.g.
`SELECT NAMED_STRUCT('a',1) AS col1 FROM VALUES('a') ... HAVING col1.a > 0`)
precisely to surface the mis-binding. This PR makes NullType the lone base type
that silently swallows it — a user who writes `HAVING col1.a == 1` expecting
the row to pass gets empty output with no signal. Is that intended? If so,
please document the divergence and move this case next to the existing throwing
cases so the contrast is locked in. The simple struct/array/map cases above
would fit `extract-value-resolution-edge-cases.sql` or `extract.sql`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]