[GitHub] [spark] peter-toth commented on a diff in pull request #37334: [WIP][SPARK-39887][SQL] RemoveRedundantAliases should keep attributes of a Union's first child

GitBox Tue, 02 Aug 2022 08:46:26 -0700


peter-toth commented on code in PR #37334:
URL: https://github.com/apache/spark/pull/37334#discussion_r935756011



##########
sql/core/src/test/scala/org/apache/spark/sql/execution/CoalesceShufflePartitionsSuite.scala:
##########
@@ -339,12 +339,12 @@ class CoalesceShufflePartitionsSuite extends 
SparkFunSuite {
       //     ShuffleQueryStage 0
       //   ShuffleQueryStage 2
       //     ReusedQueryStage 0
-      val grouped = df.groupBy("key").agg(max("value").as("value"))
+      val grouped = df.groupBy((col("key") + 
1).as("key")).agg(max("value").as("value"))

Review Comment:
   I had to modify the test because the `AliasAwareOutputPartitioning` fix 
modified the the explain plan of the original query from:
   ```
   Union
   :- *(5) HashAggregate(keys=[_groupingexpression#79L], 
functions=[max(value#38L)], output=[(key + 1)#44L, max(value)#45L])
   :  +- AQEShuffleRead coalesced
   :     +- ShuffleQueryStage 3
   :        +- Exchange hashpartitioning(_groupingexpression#79L, 5), 
ENSURE_REQUIREMENTS, [plan_id=693]
   :           +- *(3) HashAggregate(keys=[_groupingexpression#79L], 
functions=[partial_max(value#38L)], output=[_groupingexpression#79L, max#62L])
   :              +- *(3) HashAggregate(keys=[key#12L], 
functions=[max(value#13L)], output=[value#38L, _groupingexpression#79L])
   :                 +- AQEShuffleRead coalesced
   :                    +- ShuffleQueryStage 0
   :                       +- Exchange hashpartitioning(key#12L, 5), 
ENSURE_REQUIREMENTS, [plan_id=623]
   :                          +- *(1) HashAggregate(keys=[key#12L], 
functions=[partial_max(value#13L)], output=[key#12L, max#64L])
   :                             +- *(1) Project [id#10L AS key#12L, id#10L AS 
value#13L]
   :                                +- *(1) Range (0, 6, step=1, splits=10)
   +- *(6) HashAggregate(keys=[_groupingexpression#80L], 
functions=[max(value#38L)], output=[(key + 2)#51L, max(value)#52L])
      +- AQEShuffleRead coalesced
         +- ShuffleQueryStage 4
            +- Exchange hashpartitioning(_groupingexpression#80L, 5), 
ENSURE_REQUIREMENTS, [plan_id=719]
               +- *(4) HashAggregate(keys=[_groupingexpression#80L], 
functions=[partial_max(value#38L)], output=[_groupingexpression#80L, max#66L])
                  +- *(4) HashAggregate(keys=[key#12L], 
functions=[max(value#13L)], output=[value#38L, _groupingexpression#80L])
                     +- AQEShuffleRead coalesced
                        +- ShuffleQueryStage 2
                           +- ReusedExchange [key#12L, max#64L], Exchange 
hashpartitioning(key#12L, 5), ENSURE_REQUIREMENTS, [plan_id=623]
   ```
   to (1 less exchange):
   ```
   Union
   :- *(3) HashAggregate(keys=[_groupingexpression#75L], 
functions=[max(value#38L)], output=[(key + 1)#44L, max(value)#45L])
   :  +- AQEShuffleRead coalesced
   :     +- ShuffleQueryStage 0
   :        +- Exchange hashpartitioning(_groupingexpression#75L, 5), 
ENSURE_REQUIREMENTS, [plan_id=514]
   :           +- *(1) HashAggregate(keys=[_groupingexpression#75L], 
functions=[partial_max(value#38L)], output=[_groupingexpression#75L, max#62L])
   :              +- *(1) HashAggregate(keys=[key#12L], 
functions=[max(value#13L)], output=[value#38L, _groupingexpression#75L])
   :                 +- *(1) HashAggregate(keys=[key#12L], 
functions=[partial_max(value#13L)], output=[key#12L, max#64L])
   :                    +- *(1) Project [id#10L AS key#12L, id#10L AS value#13L]
   :                       +- *(1) Range (0, 6, step=1, splits=10)
   +- *(4) HashAggregate(keys=[_groupingexpression#76L], 
functions=[max(value#38L)], output=[(key + 2)#51L, max(value)#52L])
      +- AQEShuffleRead coalesced
         +- ShuffleQueryStage 1
            +- Exchange hashpartitioning(_groupingexpression#76L, 5), 
ENSURE_REQUIREMENTS, [plan_id=532]
               +- *(2) HashAggregate(keys=[_groupingexpression#76L], 
functions=[partial_max(value#38L)], output=[_groupingexpression#76L, max#66L])
                  +- *(2) HashAggregate(keys=[key#12L], 
functions=[max(value#13L)], output=[value#38L, _groupingexpression#76L])
                     +- *(2) HashAggregate(keys=[key#12L], 
functions=[partial_max(value#13L)], output=[key#12L, max#64L])
                        +- *(2) Project [id#55L AS key#12L, id#55L AS value#13L]
                           +- *(2) Range (0, 6, step=1, splits=10)
   ```
   and so the query didn't match the `test case 2` description.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] peter-toth commented on a diff in pull request #37334: [WIP][SPARK-39887][SQL] RemoveRedundantAliases should keep attributes of a Union's first child

Reply via email to