[GitHub] [spark] Eric5553 opened a new pull request #27368: [SPARK-30651][SQL] Add detailed information for Aggregate operators in EXPLAIN FORMATTED

GitBox Mon, 27 Jan 2020 08:34:20 -0800

Eric5553 opened a new pull request #27368: [SPARK-30651][SQL] Add detailed 
information for Aggregate operators in EXPLAIN FORMATTED
URL: https://github.com/apache/spark/pull/27368
 
 
   ### What changes were proposed in this pull request?
   Currently `EXPLAIN FORMATTED` only report input attributes of 
HashAggregate/ObjectHashAggregate/SortAggregate, while `EXPLAIN EXTENDED` 
provides more information of Keys, Functions, etc. This PR enhanced `EXPLAIN 
FORMATTED` to sync with original explain behavior.
   
   ### Why are the changes needed?
   The newly added `EXPLAIN FORMATTED` got less information comparing to the 
original `EXPLAIN EXTENDED`
   
   ### Does this PR introduce any user-facing change?
   Yes, taking HashAggregate explain result as example.
   
   **SQL**
   ```
   EXPLAIN FORMATTED
     SELECT
       COUNT(val) FILTER (WHERE val = 1),
       COUNT(key) FILTER (WHERE val > 1)
     FROM explain_temp1;
   ```
   
   **EXPLAIN EXTENDED**
   ```
   == Physical Plan ==
   *(2) HashAggregate(keys=[], functions=[count(val#156), count(key#155)], 
output=[count(val) FILTER (WHERE (val = 1))#462L, count(key) FILTER (WHERE (val 
> 1))#463L])
   +- Exchange SinglePartition, true, [id=#1007]
      +- HashAggregate(keys=[], functions=[partial_count(val#156) FILTER (WHERE 
(val#156 = 1)), partial_count(key#155) FILTER (WHERE (val#156 > 1))], 
output=[count#466L, count#467L])
         +- *(1) ColumnarToRow
            +- FileScan parquet default.explain_temp1[key#155,val#156] Batched: 
true, DataFilters: [], Format: Parquet, Location: 
InMemoryFileIndex[file:/Users/wuxin/Eric/spark-dev/spark/spark-warehouse/explain_temp1],
 PartitionFilters: [], PushedFilters: [], ReadSchema: struct<key:int,val:int>
   ```
   
   **EXPLAIN FORMATTED - BEFORE**
   ```
   == Physical Plan ==
   * HashAggregate (5)
   +- Exchange (4)
      +- HashAggregate (3)
         +- * ColumnarToRow (2)
            +- Scan parquet default.explain_temp1 (1)
   
   ...
   ...
   (3) HashAggregate 
   Input: [key#x, val#x]
   ...
   ...
   ```
   
   **EXPLAIN FORMATTED - AFTER**
   ```
   == Physical Plan ==
   * HashAggregate (5)
   +- Exchange (4)
      +- HashAggregate (3)
         +- * ColumnarToRow (2)
            +- Scan parquet default.explain_temp1 (1)
   
   ...
   ...
   (3) HashAggregate 
   Input: [key#x, val#x]
   Output: [count#xL, count#xL]
   Keys: []
   Functions: [partial_count(val#x) FILTER (WHERE (val#x = 1)), 
partial_count(key#x) FILTER (WHERE (val#x > 1))]
   ...
   ...
   ```
   
   ### How was this patch tested?
   Three tests added in explain.sql for 
HashAggregate/ObjectHashAggregate/SortAggregate.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] Eric5553 opened a new pull request #27368: [SPARK-30651][SQL] Add detailed information for Aggregate operators in EXPLAIN FORMATTED

Reply via email to