HeartSaVioR commented on code in PR #56546:
URL: https://github.com/apache/spark/pull/56546#discussion_r3423984712


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala:
##########
@@ -2312,10 +2312,29 @@ case class OneRowRelation() extends LeafNode {
   }
 }
 
+/**
+ * The original recipe behind a [[Deduplicate]] / 
[[DeduplicateWithinWatermark]] node, set by the
+ * `ResolveDeduplicate` analyzer rule and retained so a streaming query can 
recompute its key
+ * attributes at query start in the ordering pinned in the offset log (see
+ * `ResolveDeduplicate.computeKeys`). A `None` spec on a node means it was not 
built from
+ * `dropDuplicates*` (e.g. an internally/test-constructed node) and its keys 
must NOT be recomputed.
+ *
+ * @param subset the user-requested subset of column names (ignored when 
`allColumnsAsKeys`).
+ * @param allColumnsAsKeys when true, every column of the child is a 
deduplication key.
+ * @param viaSparkClassic whether this was built via Spark Classic 
(`Dataset.dropDuplicates*`, true)
+ *   or Spark Connect (`transformDeduplicate`, false). Only consulted when 
recomputing the keys in
+ *   the legacy order, where the two engines historically differed. See 
SPARK-57489.
+ */
+case class DeduplicateSpec(
+    subset: Seq[String],

Review Comment:
   Ah I though we use subset as the param name for DataFrame API - it's not. 
Given we use `colNames` from there, I'll use it.
   
   For the second part, I followed the way in Spark Connect but the suggestion 
makes total sense. Thanks for the suggestion!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to