cloud-fan commented on a change in pull request #26656: [SPARK-27986][SQL]
Support ANSI SQL filter clause for aggregate expression
URL: https://github.com/apache/spark/pull/26656#discussion_r359222342
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala
##########
@@ -75,6 +75,52 @@ import org.apache.spark.sql.types.IntegerType
* LocalTableScan [...]
* }}}
*
+ * Second example: aggregate function without distinct and with filter clauses
(in sql):
+ * {{{
+ * SELECT
+ * COUNT(DISTINCT cat1) as cat1_cnt,
+ * COUNT(DISTINCT cat2) as cat2_cnt,
+ * SUM(value) FILTER (
+ * WHERE
+ * id > 1
+ * ) AS total
+ * FROM
+ * data
+ * GROUP BY
+ * key
+ * }}}
+ *
+ * This translates to the following (pseudo) logical plan:
+ * {{{
+ * Aggregate(
+ * key = ['key]
+ * functions = [COUNT(DISTINCT 'cat1),
+ * COUNT(DISTINCT 'cat2),
+ * sum('value) with FILTER('id > 1)]
+ * output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
+ * LocalTableScan [...]
+ * }}}
+ *
+ * This rule rewrites this logical plan to the following (pseudo) logical plan:
+ * {{{
+ * Aggregate(
+ * key = ['key]
+ * functions = [count(if (('gid = 1)) 'cat1 else null),
+ * count(if (('gid = 2)) 'cat2 else null),
+ * first(if (('gid = 0)) 'total else null) ignore nulls]
+ * output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
+ * Aggregate(
+ * key = ['key, 'cat1, 'cat2, 'gid]
+ * functions = [sum('value) with FILTER('id > 1)]
+ * output = ['key, 'cat1, 'cat2, 'gid, 'total])
+ * Expand(
+ * projections = [('key, null, null, 0, cast('value as bigint), 'id),
Review comment:
nvm, it's cheap to output an extra column in `Expand`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]