cloud-fan commented on a change in pull request #27428:
URL: https://github.com/apache/spark/pull/27428#discussion_r454391588
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala
##########
@@ -102,23 +102,127 @@ import org.apache.spark.sql.types.IntegerType
* {{{
* Aggregate(
* key = ['key]
- * functions = [count(if (('gid = 1)) 'cat1 else null),
- * count(if (('gid = 2)) 'cat2 else null),
+ * functions = [count(if (('gid = 1)) '_gen_attr_1 else null),
+ * count(if (('gid = 2)) '_gen_attr_2 else null),
* first(if (('gid = 0)) 'total else null) ignore nulls]
* output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
* Aggregate(
- * key = ['key, 'cat1, 'cat2, 'gid]
- * functions = [sum('value) with FILTER('id > 1)]
- * output = ['key, 'cat1, 'cat2, 'gid, 'total])
+ * key = ['key, '_gen_attr_1, '_gen_attr_2, 'gid]
+ * functions = [sum('_gen_attr_3)]
+ * output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, 'total])
* Expand(
- * projections = [('key, null, null, 0, cast('value as bigint), 'id),
+ * projections = [('key, null, null, 0, if ('id > 1) cast('value as
bigint) else null, 'id),
* ('key, 'cat1, null, 1, null, null),
* ('key, null, 'cat2, 2, null, null)]
- * output = ['key, 'cat1, 'cat2, 'gid, 'value, 'id])
+ * output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, '_gen_attr_3, 'id])
* LocalTableScan [...]
* }}}
*
- * The rule does the following things here:
+ * Third example: single distinct aggregate function with filter clauses and
have
+ * not other distinct aggregate function (in sql):
+ * {{{
+ * SELECT
+ * COUNT(DISTINCT cat1) FILTER (WHERE id > 1) as cat1_cnt,
+ * SUM(value) AS total
+ * FROM
+ * data
+ * GROUP BY
+ * key
+ * }}}
+ *
+ * This translates to the following (pseudo) logical plan:
+ * {{{
+ * Aggregate(
+ * key = ['key]
+ * functions = [COUNT(DISTINCT 'cat1) with FILTER('id > 1),
+ * sum('value)]
+ * output = ['key, 'cat1_cnt, 'total])
+ * LocalTableScan [...]
+ * }}}
+ *
+ * This rule rewrites this logical plan to the following (pseudo) logical plan:
+ * {{{
+ * Aggregate(
+ * key = ['key]
+ * functions = [count('_gen_attr_1),
+ * sum('_gen_attr_2)]
+ * output = ['key, 'cat1_cnt, 'total])
+ * Project(
+ * projectionList = ['key, if ('id > 1) 'cat1 else null, cast('value as
bigint)]
Review comment:
I mean to unify the implementations of the filter clause that are
handled by this rule. This case is not handled by this rule before your PR.
Sorry if I didn't make myself clear enough.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]