AnishMahto commented on code in PR #56042:
URL: https://github.com/apache/spark/pull/56042#discussion_r3293577669


##########
sql/pipelines/src/main/scala/org/apache/spark/sql/pipelines/graph/Flow.scala:
##########
@@ -194,3 +243,109 @@ class AppendOnceFlow(
 
   override val once = true
 }
+
+/**
+ * A resolved flow that applies a CDC event stream to a target table via 
MERGE, in accordance to
+ * the configured [[flow.changeArgs]].
+ */
+class AutoCdcMergeFlow(
+    val flow: AutoCdcFlow,
+    val funcResult: FlowFunctionResult
+) extends ResolvedFlow {
+  requireReservedPrefixAbsentInSourceColumns()
+
+  def changeArgs: ChangeArgs = flow.changeArgs
+
+  /**
+   * Returns the augmented output schema of this flow, which can differ from 
the schema of the
+   * source change-data-feed dataframe.
+   *
+   * The source dataframe's schema describes the incoming CDC events; the 
augmented schema here
+   * applies the user-specified [[ColumnSelection]] and appends the 
SCD-specific metadata
+   * columns that the AutoCDC MERGE engine projects onto the target table. 
Downstream
+   * dependencies in the pipeline see this augmented schema.
+   */
+  override val schema: StructType = {

Review Comment:
   This is a great callout.
   
   Since AutoCDC flows must write to a streaming table (MV, temp view, 
persisted view are all invalid targets), `AutoCdcMergeFlow.load` isn't actually 
ever called.
   
   But the comment is right that without an override, the inherited 
`AutoCdcMergeFlow.load` implementation is incorrect. Regardless of whether its 
called at runtime or not today, we should make sure the contract is correct.
   
   Added a meaningful override and also left a comment explaining this. I also 
added tests to demonstrate that AutoCDC flows cannot write to MV/persisted 
view/temp view.
   
   As a side note, I think this is a great example of why good inheritance is 
difficult to get right and more often than not adds unnecessary coupling 😛.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to