cloud-fan commented on code in PR #56237:
URL: https://github.com/apache/spark/pull/56237#discussion_r3334492207


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala:
##########
@@ -95,6 +95,11 @@ object ExtractValue {
 
       case (MapType(_, _, _), _) => Left(GetMapValue(child, extraction))
 
+      // Extracting from NULL should return NULL, not throw an error. This can 
happen when a
+      // column has NullType (e.g. from schema evolution with missing 
columns), and aligns with
+      // standard SQL semantics where any operation on NULL yields NULL.
+      case (NullType, _) => Left(Literal(null, NullType))

Review Comment:
   This is a behavior change to the shared `ExtractValue` utility, but only the 
user-facing SELECT path is tested. 
`extractValue`/`isExtractable`/`ExtractValue.apply` are consumed by ~10 call 
sites — legacy `ResolveReferences`/`TempResolvedColumn` (HAVING/ORDER BY), 
single-pass `NameScope` candidate filtering, `ResolveUnion.mergeFields` (union 
by name), `higherOrderFunctions`, `PivotTransformer`, `VariableResolution`, 
`DeserializerBuildHelper`, `NestedColumnAliasing` — several of which previously 
relied on the throw as a validation signal. The most consequential side effect: 
`isExtractable` now returns true for NullType, so in `NameScope` 
(NameScope.scala:761,826) a NullType column becomes a valid extraction 
candidate, broadening the candidate set and potentially changing which 
attribute a multipart name binds to (or introducing new ambiguity). Could you 
justify that NULL-propagation rather than the prior error is correct for the 
non-SELECT consumers, or scope the fix to the user
 -facing resolution point?
   
   Secondary note: this arm discards `extraction`, so for 
`col[expr]`/`col[keyExpr]` the index/key expression is never evaluated — unlike 
the sibling `GetArrayItem`/`GetMapValue` arms above. The result is NULL 
regardless under null-propagation, so this is benign, but it diverges from the 
siblings.



##########
sql/core/src/test/resources/sql-tests/inputs/extract-value-nulltype.sql:
##########
@@ -0,0 +1,16 @@
+-- Test ExtractValue behavior with NullType base expressions.
+
+-- Struct field access on a NullType column
+SELECT col.a FROM (SELECT null AS col) t;
+
+-- Array index on a NullType column
+SELECT col[0] FROM (SELECT null AS col) t;
+
+-- Map key access on a NullType column
+SELECT col['key'] FROM (SELECT null AS col) t;
+
+-- HAVING with field access on NullType grouped column
+SELECT NAMED_STRUCT('a', 1) AS col1
+FROM VALUES (NULL)
+GROUP BY col1
+HAVING col1.a == 1;

Review Comment:
   This case returns an empty result, but not for the reason it looks like. 
`VALUES (NULL)` produces an aggregate *input* column default-named `col1` of 
NullType, which shadows the `NAMED_STRUCT('a', 1) AS col1` SELECT alias. In 
HAVING, `col1` resolves against the input column (via `TempResolvedColumn`), so 
`col1.a` = `extractValue(NullType, "a")` -> `Literal(null)` -> `cast(null as 
int) = 1` -> false -> empty.
   
   The existing `having-and-order-by-recursive-type-name-resolution.sql` is 
dedicated to exactly this shadowing scenario and deliberately throws 
`INVALID_EXTRACT_BASE_FIELD_TYPE` for STRING/ARRAY/MAP input bases (e.g. 
`SELECT NAMED_STRUCT('a',1) AS col1 FROM VALUES('a') ... HAVING col1.a > 0`) 
precisely to surface the mis-binding. This PR makes NullType the lone base type 
that silently swallows it — a user who writes `HAVING col1.a == 1` expecting 
the row to pass gets empty output with no signal. Is that intended? If so, 
please document the divergence and move this case next to the existing throwing 
cases so the contrast is locked in. The simple struct/array/map cases above 
would fit `extract-value-resolution-edge-cases.sql` or `extract.sql`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to