Re: [PR] [SPARK-54496][SQL] Fix Merge Into Schema Evolution for Dataframe API [spark]

via GitHub Mon, 24 Nov 2025 21:24:25 -0800


szehon-ho commented on code in PR #53207:
URL: https://github.com/apache/spark/pull/53207#discussion_r2558582195



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveMergeIntoSchemaEvolution.scala:
##########
@@ -42,15 +45,19 @@ object ResolveMergeIntoSchemaEvolution extends 
Rule[LogicalPlan] {
       if (changes.isEmpty) {
         m
       } else {
-        m transformUpWithNewOutput {
-          case r @ DataSourceV2Relation(_: SupportsRowLevelOperations, _, _, 
_, _, _) =>
+        val finalAttrMapping = ArrayBuffer.empty[(Attribute, Attribute)]

Review Comment:
   There is a bug here, as it actually hits  _both_ the sourceTable and 
targetTable and tries schema evolution on both, when actually schema evolution 
should always be performed only for target table.
   
   I had done it this way because of the limitation of transformUpWithNewOutput 
that it doesn't re-map the attribues of the top level object (MergeIntoTable).  
See https://github.com/apache/spark/pull/52866#discussion_r2512952222 for my 
finding.  I had assumed that the match with SupportsRowLevelOperation Table 
would be enough to only do schema evolution on the target, but I was wrong.  So 
I add an extra rewriteAttrs to rewrite the top level object (MergeIntoTable).



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala:
##########
@@ -916,19 +922,29 @@ case class MergeIntoTable(
       false
     } else {
       val actions = matchedActions ++ notMatchedActions
-      val assignments = actions.collect {
-        case a: UpdateAction => a.assignments
-        case a: InsertAction => a.assignments
-      }.flatten
-      val sourcePaths = DataTypeUtils.extractAllFieldPaths(sourceTable.schema)
-      assignments.forall { assignment =>
-        assignment.resolved ||
-          (assignment.value.resolved && sourcePaths.exists {
-            path => MergeIntoTable.isEqual(assignment, path)
-          })
+      val hasStarActions = actions.exists {

Review Comment:
   The call to canEvaluateSchemaEvolution (that guards whether the schema 
evolution check gets evaluated) gets called with updateStar and insertStar.  
This is triggered in the Dataframe API, and revealed I had missed this case.   
So here I return false to explicitly skip until they are resolved.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-54496][SQL] Fix Merge Into Schema Evolution for Dataframe API [spark]

Reply via email to