cloud-fan commented on code in PR #55981:
URL: https://github.com/apache/spark/pull/55981#discussion_r3308218671
##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/V2ExpressionUtilsSuite.scala:
##########
@@ -37,4 +37,49 @@ class V2ExpressionUtilsSuite extends SparkFunSuite {
}
assert(exc.message.contains("v2Fun(a) ASC NULLS FIRST is not currently
supported"))
}
+
+ test("resolveRefs[NamedExpression] handles nested field references") {
+ val structType = StructType(Seq(StructField("inner", IntegerType, nullable
= false)))
+ val structAttr = AttributeReference("outer", structType, nullable =
false)()
+ val plan = LocalRelation(structAttr)
+ val nestedRef = FieldReference(Seq("outer", "inner"))
+
+ // A nested reference resolves to an Alias(GetStructField(...)), not an
AttributeReference,
+ // so the caller's checkcast fails when it dereferences an element as
AttributeReference.
+ // (The `asInstanceOf[T]` inside `resolveRef` is erased to the
`NamedExpression` upper bound,
+ // so the CCE fires at the call site, not inside the helper.)
+ intercept[ClassCastException] {
+ val resolvedAsAttr: AttributeReference =
+ V2ExpressionUtils.resolveRefs[AttributeReference](Seq(nestedRef),
plan).head
+ resolvedAsAttr
+ }
Review Comment:
The `intercept[ClassCastException]` block locks in a behaviour that's an
artefact of Scala generic erasure — the comment correctly explains it, but the
assertion documents the *bug* rather than the contract being added. If anyone
later changes the helper to do an explicit runtime check (the cleaner shape),
this test breaks for unrelated reasons. Drop the regression half; the positive
assertion on the `NamedExpression` shape below is what the PR needs to keep
passing.
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteRowLevelCommand.scala:
##########
@@ -91,9 +91,9 @@ trait RewriteRowLevelCommand extends Rule[LogicalPlan] {
relation: DataSourceV2Relation,
operation: SupportsDelta): Seq[AttributeReference] = {
- val rowIdAttrs = V2ExpressionUtils.resolveRefs[AttributeReference](
+ val rowIdAttrs = V2ExpressionUtils.resolveRefs[NamedExpression](
operation.rowId.toImmutableArraySeq,
- relation)
+ relation).map(_.toAttribute.asInstanceOf[AttributeReference])
Review Comment:
The synthetic `AttributeReference` returned from `.toAttribute` has no link
to the underlying scan: for a nested ref like `outer.inner`, the scan still
produces `outer`, and nothing extracts `inner` from it.
`buildRelationWithAttrs` then folds this synthetic attribute into
`relation.output` as if it were a real column.
The structural fix is to keep `resolveRowIdAttrs` returning
`Seq[NamedExpression]` and wrap the read relation with a `Project` that aliases
each non-`Attribute` rowId expression to a top-level helper column — same
pattern as `ResolveChangelogTable.addStreamingNetChangeComputation`
(`ResolveChangelogTable.scala:706-710`):
```scala
val rowIdHelpers: Seq[Alias] = rowIdExprs.zipWithIndex.map { case (expr,
idx) =>
Alias(expr, NetChangesHelperColumns.rowIdColumn(idx))()
}
val withHelpers = Project(originalCols ++ rowIdHelpers, watermarked)
```
Flat case stays a no-op (no Project inserted, no fresh `ExprId`s); nested
case gets a real column with `GetStructField(...)` actually evaluated.
`WriteDelta.rowIdAttrsResolved` at `v2Commands.scala:500` is then correct as-is
— it's a structural check against the connector's flat
`rowIdProjection.schema`. The `.asInstanceOf[AttributeReference]` here and its
inconsistency with that site (which omits the cast) both go away once the
return type is widened.
Related: `requiredMetadataAttributes()` at line 85 above and
`filterAttributes()` at `DataSourceV2Relation.scala:185` are also
`NamedReference[]` but still use `resolveRefs[AttributeReference]`. Either
widen consistently across the row-level API surface, or document why nested
refs are only meaningful for rowIds.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]