cloud-fan commented on code in PR #56237:
URL: https://github.com/apache/spark/pull/56237#discussion_r3338533796
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala:
##########
@@ -396,7 +396,13 @@ package object expressions {
// Then this will add ExtractValue("c", ExtractValue("b", a)), and
alias the final
// expression as "c".
val fieldExprs = nestedFields.foldLeft(a: Expression) { (e, name) =>
- ExtractValue(e, Literal(name), resolver)
+ // Extracting a field from a NULL (NullType) base yields NULL (SQL
NULL propagation),
+ // rather than throwing INVALID_EXTRACT_BASE_FIELD_TYPE. A
NullType column can arise
+ // e.g. from schema evolution with missing columns. This is scoped
to dotted
+ // multipart field access (`col.a`); `col['key']`/`col[0]` go
through
+ // UnresolvedExtractValue and still throw.
+ if (e.dataType == NullType) Literal(null, NullType)
Review Comment:
This fold is reached by the **legacy** analyzer, but the single-pass
`Resolver` takes a different route to the same method and never reaches this
branch, so the fix is legacy-only.
- Legacy: `AttributeSeq.getCandidatesForResolution` (package.scala:361-377)
does **not** filter candidates by `isExtractable`, so the NullType candidate
reaches this fold and returns `Literal(null)`.
- Single-pass: `NameScope.getCandidatesForResolution`
(NameScope.scala:759-766) filters candidates through
`ExtractValue.isExtractable` **before** calling `resolveCandidates`
(NameScope.scala:729). `isExtractable(NullType, ["a"])` is `false` (the
`extractValue(NullType, _)` arm returns the error,
complexTypeExtractors.scala:98), so the candidate is dropped,
`resolveCandidates` receives an empty set, and the name fails to resolve — the
single-pass analyzer **throws**.
Net effect: `SELECT col.a FROM (SELECT null AS col)` returns a NULL row
under the legacy analyzer (default) but throws under
`spark.sql.analyzer.singlePassResolver.enabled`, and under dual-run sampling
`HybridAnalyzer` (HybridAnalyzer.scala:43-51) throws the single-pass failure
even though the legacy run succeeded. Round 1's `isExtractable`-changing
approach didn't have this divergence — both analyzers would have returned NULL.
Could you apply the fix consistently to both analyzers (e.g. make
`isExtractable` treat a fully-NullType extraction chain as extractable so the
single-pass path reaches the same NULL result, or handle NullType in the
single-pass multipart path), or — if legacy-only is intended — justify it and
add single-pass coverage that locks in the behavior?
##########
sql/core/src/test/resources/sql-tests/inputs/extract-value-resolution-edge-cases.sql:
##########
@@ -8,3 +8,12 @@ SELECT col1.a, a FROM t1 ORDER BY col1.a;
SELECT split(col1, '-')[1] AS a FROM VALUES('a-b') ORDER BY split(col1,
'-')[1];
DROP TABLE t1;
+
+-- SPARK-57186: dotted multipart field access on a NullType base returns NULL
instead of throwing
+-- INVALID_EXTRACT_BASE_FIELD_TYPE (NULL propagation; a NullType column can
arise e.g. from schema
+-- evolution with missing columns). The fix is scoped to multipart name
resolution (`col.a`); the
+-- `col['key']` and `col[0]` forms go through UnresolvedExtractValue and still
throw, so the
+-- shared ExtractValue utility and the single-pass resolver are unaffected.
+SELECT col.a FROM (SELECT null AS col) t;
+SELECT col[0] FROM (SELECT null AS col) t;
Review Comment:
Non-blocking: for a NullType base, `col.a` now returns NULL while the
equivalent subscript forms `col[0]`/`col['key']` still throw
`INVALID_EXTRACT_BASE_FIELD_TYPE`. `col.a` and `col['a']` are equivalent
struct-field-access syntaxes elsewhere in Spark, so this asymmetry is worth
confirming as intentional. It's largely tied to the analyzer-divergence finding
above — a consistent cross-analyzer fix is the natural point to decide whether
the subscript forms should follow `col.a` here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]