Eric5553 commented on a change in pull request #27368: [SPARK-30651][SQL] Add
detailed information for Aggregate operators in EXPLAIN FORMATTED
URL: https://github.com/apache/spark/pull/27368#discussion_r377144347
##########
File path: sql/core/src/test/resources/sql-tests/results/explain.sql.out
##########
@@ -786,6 +870,144 @@ Output: []
(4) Project
+-- !query
+EXPLAIN FORMATTED
+ SELECT
+ COUNT(val) + SUM(key) as TOTAL,
+ COUNT(key) FILTER (WHERE val > 1)
+ FROM explain_temp1
+-- !query schema
+struct<plan:string>
+-- !query output
+== Physical Plan ==
+* HashAggregate (5)
++- Exchange (4)
+ +- HashAggregate (3)
+ +- * ColumnarToRow (2)
+ +- Scan parquet default.explain_temp1 (1)
+
+
+(1) Scan parquet default.explain_temp1
+Output: [key#x, val#x]
+Batched: true
+Location [not included in comparison]/{warehouse_dir}/explain_temp1]
+ReadSchema: struct<key:int,val:int>
+
+(2) ColumnarToRow [codegen id : 1]
+Input: [key#x, val#x]
+
+(3) HashAggregate
+Input: [key#x, val#x]
+Keys: []
+Functions: [partial_count(val#x), partial_sum(cast(key#x as bigint)),
partial_count(key#x) FILTER (WHERE (val#x > 1))]
+Aggregate Attributes: [count#xL, sum#xL, count#xL]
+Results: [count#xL, sum#xL, count#xL]
+
+(4) Exchange
+Input: [count#xL, sum#xL, count#xL]
+
+(5) HashAggregate [codegen id : 2]
+Input: [count#xL, sum#xL, count#xL]
+Keys: []
+Functions: [count(val#x), sum(cast(key#x as bigint)), count(key#x)]
+Aggregate Attributes: [count(val#x)#xL, sum(cast(key#x as bigint))#xL,
count(key#x)#xL]
+Results: [(count(val#x)#xL + sum(cast(key#x as bigint))#xL) AS TOTAL#xL,
count(key#x)#xL AS count(key) FILTER (WHERE (val > 1))#xL]
+
+
+-- !query
+EXPLAIN FORMATTED
+ SELECT key, sort_array(collect_set(val))[0]
+ FROM explain_temp4
+ GROUP BY key
+-- !query schema
+struct<plan:string>
+-- !query output
+== Physical Plan ==
+ObjectHashAggregate (5)
++- Exchange (4)
+ +- ObjectHashAggregate (3)
+ +- * ColumnarToRow (2)
+ +- Scan parquet default.explain_temp4 (1)
+
+
+(1) Scan parquet default.explain_temp4
+Output: [key#x, val#x]
+Batched: true
+Location [not included in comparison]/{warehouse_dir}/explain_temp4]
+ReadSchema: struct<key:int,val:string>
+
+(2) ColumnarToRow [codegen id : 1]
+Input: [key#x, val#x]
+
+(3) ObjectHashAggregate
+Input: [key#x, val#x]
+Keys: [key#x]
+Functions: [partial_collect_set(val#x, 0, 0)]
+Aggregate Attributes: [buf#x]
+Results: [key#x, buf#x]
+
+(4) Exchange
+Input: [key#x, buf#x]
+
+(5) ObjectHashAggregate
+Input: [key#x, buf#x]
+Keys: [key#x]
+Functions: [collect_set(val#x, 0, 0)]
+Aggregate Attributes: [collect_set(val#x, 0, 0)#x]
Review comment:
I just gave a try with this idea, the attribute format is controlled by
`AttributeReferences.toString = s"$name#${exprId.id}$typeSuffix$delaySuffix"`.
After removed `#${exprId.id}`, the output is
```
Input: [key, buf]
Keys: [key]
Functions: [collect_set(val, 0, 0)]
Aggregate Attributes: [collect_set(val, 0, 0)]
Results: [key, sort_array(collect_set(val, 0, 0), true)[0] AS
sort_array(collect_set(val), true)[0]#x]
```
Some follow-up questions:
1. Of course the change affected too much, all `expr.id` was removed. We
need to try other ways to eliminate the affect, right?
2. The remaining `#x` is printed by class `Alias`, do we need that?
3. Take the expression `sort_array(collect_set(val, 0, 0), true)[0]` as
example, I think ` 0, 0), true)[0]` part is hard to understand to me, do we
need to format those attributes to filedName->value pair ?
@cloud-fan @gatorsmile Please help feedback :-)
Maybe we can move this improvement together with the follow-up PR
https://github.com/apache/spark/pull/27509?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]