Eric5553 opened a new pull request #27368: [SPARK-30651][SQL] Add detailed information for Aggregate operators in EXPLAIN FORMATTED URL: https://github.com/apache/spark/pull/27368 ### What changes were proposed in this pull request? Currently `EXPLAIN FORMATTED` only report input attributes of HashAggregate/ObjectHashAggregate/SortAggregate, while `EXPLAIN EXTENDED` provides more information of Keys, Functions, etc. This PR enhanced `EXPLAIN FORMATTED` to sync with original explain behavior. ### Why are the changes needed? The newly added `EXPLAIN FORMATTED` got less information comparing to the original `EXPLAIN EXTENDED` ### Does this PR introduce any user-facing change? Yes, taking HashAggregate explain result as example. **SQL** ``` EXPLAIN FORMATTED SELECT COUNT(val) FILTER (WHERE val = 1), COUNT(key) FILTER (WHERE val > 1) FROM explain_temp1; ``` **EXPLAIN EXTENDED** ``` == Physical Plan == *(2) HashAggregate(keys=[], functions=[count(val#156), count(key#155)], output=[count(val) FILTER (WHERE (val = 1))#462L, count(key) FILTER (WHERE (val > 1))#463L]) +- Exchange SinglePartition, true, [id=#1007] +- HashAggregate(keys=[], functions=[partial_count(val#156) FILTER (WHERE (val#156 = 1)), partial_count(key#155) FILTER (WHERE (val#156 > 1))], output=[count#466L, count#467L]) +- *(1) ColumnarToRow +- FileScan parquet default.explain_temp1[key#155,val#156] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/Users/wuxin/Eric/spark-dev/spark/spark-warehouse/explain_temp1], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<key:int,val:int> ``` **EXPLAIN FORMATTED - BEFORE** ``` == Physical Plan == * HashAggregate (5) +- Exchange (4) +- HashAggregate (3) +- * ColumnarToRow (2) +- Scan parquet default.explain_temp1 (1) ... ... (3) HashAggregate Input: [key#x, val#x] ... ... ``` **EXPLAIN FORMATTED - AFTER** ``` == Physical Plan == * HashAggregate (5) +- Exchange (4) +- HashAggregate (3) +- * ColumnarToRow (2) +- Scan parquet default.explain_temp1 (1) ... ... (3) HashAggregate Input: [key#x, val#x] Output: [count#xL, count#xL] Keys: [] Functions: [partial_count(val#x) FILTER (WHERE (val#x = 1)), partial_count(key#x) FILTER (WHERE (val#x > 1))] ... ... ``` ### How was this patch tested? Three tests added in explain.sql for HashAggregate/ObjectHashAggregate/SortAggregate.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
