Github user hvanhovell commented on a diff in the pull request:
https://github.com/apache/spark/pull/15668#discussion_r86756665
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala
---
@@ -115,9 +115,21 @@ object RewriteDistinctAggregates extends
Rule[LogicalPlan] {
}
// Extract distinct aggregate expressions.
- val distinctAggGroups = aggExpressions
- .filter(_.isDistinct)
- .groupBy(_.aggregateFunction.children.toSet)
+ val distinctAggGroups = aggExpressions.filter(_.isDistinct).groupBy{
+ e =>
+ if (e.aggregateFunction.children.exists(!_.foldable)) {
+ // Only expand the unfoldable children
+ e.aggregateFunction.children.filter(!_.foldable).toSet
+ } else {
+ // If aggregateFunction's children are all foldable
+ // we must expand at least one of the children (here we take the
first child),
+ // or If we don't, we will get the wrong result, for example:
+ // count(distinct 1) will be explained to count(1) after the
rewrite function.
+ // Generally, the distinct aggregateFunction should not run
+ // foldable TypeCheck for the first child.
+ e.aggregateFunction.children.take(1).toSet
--- End diff --
This is a good catch. It would be great if we could git rid of this by
constant folding (not needed in this PR). Another way of getting rid of this,
would be by creating a separate processing group for these distincts.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]