cloud-fan commented on code in PR #36117:
URL: https://github.com/apache/spark/pull/36117#discussion_r853755920
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlanDistinctKeys.scala:
##########
@@ -29,6 +29,12 @@ import
org.apache.spark.sql.internal.SQLConf.PROPAGATE_DISTINCT_KEYS_ENABLED
*/
trait LogicalPlanDistinctKeys { self: LogicalPlan =>
lazy val distinctKeys: Set[ExpressionSet] = {
- if (conf.getConf(PROPAGATE_DISTINCT_KEYS_ENABLED))
DistinctKeyVisitor.visit(self) else Set.empty
+ if (conf.getConf(PROPAGATE_DISTINCT_KEYS_ENABLED)) {
+ val keys = DistinctKeyVisitor.visit(self)
+ require(keys.forall(_.nonEmpty))
Review Comment:
This is only done at the `else` branch, not the `if` branch. I think we have
two options:
1. keep the requirement, and add the `filter` in the `if` branch as well
2. remove the requirement, and remove the `filter` from the `else` branch.
I prefer option 2 as I think an empty expression set does mean something as
a distinct key, we should not ignore this information. It also works the same
as other distinct keys:
1. It can replace all other distinct keys as it's a subset of any expression
set
2. It can satisfy any distinct key requirement, e.g. remove unnecessary
distinct in aggregate functions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]