[GitHub] [spark] ulysses-you commented on a diff in pull request #36117: [SPARK-38832][SQL] Remove unnecessary distinct in aggregate expression by distinctKeys

GitBox Tue, 19 Apr 2022 18:59:07 -0700


ulysses-you commented on code in PR #36117:
URL: https://github.com/apache/spark/pull/36117#discussion_r853662688



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlanDistinctKeys.scala:
##########
@@ -29,6 +29,12 @@ import 
org.apache.spark.sql.internal.SQLConf.PROPAGATE_DISTINCT_KEYS_ENABLED
  */
 trait LogicalPlanDistinctKeys { self: LogicalPlan =>
   lazy val distinctKeys: Set[ExpressionSet] = {
-    if (conf.getConf(PROPAGATE_DISTINCT_KEYS_ENABLED)) 
DistinctKeyVisitor.visit(self) else Set.empty
+    if (conf.getConf(PROPAGATE_DISTINCT_KEYS_ENABLED)) {
+      val keys = DistinctKeyVisitor.visit(self)
+      require(keys.forall(_.nonEmpty))

Review Comment:
   I think it's more about avoid some unexpected things. It will be a 
correctness issue if other opterators return empty distinct key. And as you 
mentioned, the global aggregate has already optimzied by `EliminateDistinct` 
and `OptimizeOneRowPlan`, so it's fine ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] ulysses-you commented on a diff in pull request #36117: [SPARK-38832][SQL] Remove unnecessary distinct in aggregate expression by distinctKeys

Reply via email to