cloud-fan commented on code in PR #55981:
URL: https://github.com/apache/spark/pull/55981#discussion_r3308218671


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/V2ExpressionUtilsSuite.scala:
##########
@@ -37,4 +37,49 @@ class V2ExpressionUtilsSuite extends SparkFunSuite {
     }
     assert(exc.message.contains("v2Fun(a) ASC NULLS FIRST is not currently 
supported"))
   }
+
+  test("resolveRefs[NamedExpression] handles nested field references") {
+    val structType = StructType(Seq(StructField("inner", IntegerType, nullable 
= false)))
+    val structAttr = AttributeReference("outer", structType, nullable = 
false)()
+    val plan = LocalRelation(structAttr)
+    val nestedRef = FieldReference(Seq("outer", "inner"))
+
+    // A nested reference resolves to an Alias(GetStructField(...)), not an 
AttributeReference,
+    // so the caller's checkcast fails when it dereferences an element as 
AttributeReference.
+    // (The `asInstanceOf[T]` inside `resolveRef` is erased to the 
`NamedExpression` upper bound,
+    // so the CCE fires at the call site, not inside the helper.)
+    intercept[ClassCastException] {
+      val resolvedAsAttr: AttributeReference =
+        V2ExpressionUtils.resolveRefs[AttributeReference](Seq(nestedRef), 
plan).head
+      resolvedAsAttr
+    }

Review Comment:
   The `intercept[ClassCastException]` block locks in a behaviour that's an 
artefact of Scala generic erasure — the comment correctly explains it, but the 
assertion documents the *bug* rather than the contract being added. If anyone 
later changes the helper to do an explicit runtime check (the cleaner shape), 
this test breaks for unrelated reasons. Drop the regression half; the positive 
assertion on the `NamedExpression` shape below is what the PR needs to keep 
passing.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteRowLevelCommand.scala:
##########
@@ -91,9 +91,9 @@ trait RewriteRowLevelCommand extends Rule[LogicalPlan] {
       relation: DataSourceV2Relation,
       operation: SupportsDelta): Seq[AttributeReference] = {
 
-    val rowIdAttrs = V2ExpressionUtils.resolveRefs[AttributeReference](
+    val rowIdAttrs = V2ExpressionUtils.resolveRefs[NamedExpression](
       operation.rowId.toImmutableArraySeq,
-      relation)
+      relation).map(_.toAttribute.asInstanceOf[AttributeReference])

Review Comment:
   The synthetic `AttributeReference` returned from `.toAttribute` has no link 
to the underlying scan: for a nested ref like `outer.inner`, the scan still 
produces `outer`, and nothing extracts `inner` from it. 
`buildRelationWithAttrs` then folds this synthetic attribute into 
`relation.output` as if it were a real column.
   
   The structural fix is to keep `resolveRowIdAttrs` returning 
`Seq[NamedExpression]` and wrap the read relation with a `Project` that aliases 
each non-`Attribute` rowId expression to a top-level helper column — same 
pattern as `ResolveChangelogTable.addStreamingNetChangeComputation` 
(`ResolveChangelogTable.scala:706-710`):
   
   ```scala
   val rowIdHelpers: Seq[Alias] = rowIdExprs.zipWithIndex.map { case (expr, 
idx) =>
     Alias(expr, NetChangesHelperColumns.rowIdColumn(idx))()
   }
   val withHelpers = Project(originalCols ++ rowIdHelpers, watermarked)
   ```
   
   Flat case stays a no-op (no Project inserted, no fresh `ExprId`s); nested 
case gets a real column with `GetStructField(...)` actually evaluated. 
`WriteDelta.rowIdAttrsResolved` at `v2Commands.scala:500` is then correct as-is 
— it's a structural check against the connector's flat 
`rowIdProjection.schema`. The `.asInstanceOf[AttributeReference]` here and its 
inconsistency with that site (which omits the cast) both go away once the 
return type is widened.
   
   Related: `requiredMetadataAttributes()` at line 85 above and 
`filterAttributes()` at `DataSourceV2Relation.scala:185` are also 
`NamedReference[]` but still use `resolveRefs[AttributeReference]`. Either 
widen consistently across the row-level API surface, or document why nested 
refs are only meaningful for rowIds.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to