[GitHub] [spark] manuzhang opened a new pull request #29593: [SPARK-32753][SQL] Update logical link when eliminating Exchange with same partitioning with AQE

GitBox Mon, 31 Aug 2020 00:13:38 -0700


manuzhang opened a new pull request #29593:
URL: https://github.com/apache/spark/pull/29593



   ### What changes were proposed in this pull request?
   In AQE, update logical link when `EnsureRequirements` eliminates Exchange 
with same partitioning
   
   ### Why are the changes needed?
   Deduplicating and repartitioning the same column will create duplicate rows 
with AQE because `Exchange` is eliminated in `EnsureRequirements` but logical 
link is not updated
   
   ```
   spark.range(10).union(spark.range(10)).createOrReplaceTempView("v1")
   val df = spark.sql("select id from v1 group by id distribute by id") 
   println(df.collect().toArray.mkString(","))
   println(df.queryExecution.executedPlan)
   
   // With AQE
   
[4],[0],[3],[2],[1],[7],[6],[8],[5],[9],[4],[0],[3],[2],[1],[7],[6],[8],[5],[9]
   AdaptiveSparkPlan(isFinalPlan=true)
   +- CustomShuffleReader local
      +- ShuffleQueryStage 0
         +- Exchange hashpartitioning(id#183L, 10), true
            +- *(3) HashAggregate(keys=[id#183L], functions=[], 
output=[id#183L])
               +- Union
                  :- *(1) Range (0, 10, step=1, splits=2)
                  +- *(2) Range (0, 10, step=1, splits=2)
   
   // Without AQE
   [4],[7],[0],[6],[8],[3],[2],[5],[1],[9]
   *(4) HashAggregate(keys=[id#206L], functions=[], output=[id#206L])
   +- Exchange hashpartitioning(id#206L, 10), true
      +- *(3) HashAggregate(keys=[id#206L], functions=[], output=[id#206L])
         +- Union
            :- *(1) Range (0, 10, step=1, splits=2)
            +- *(2) Range (0, 10, step=1, splits=2)
   ```
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. Fix a bug.
   
   
   ### How was this patch tested?
   Add test.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] manuzhang opened a new pull request #29593: [SPARK-32753][SQL] Update logical link when eliminating Exchange with same partitioning with AQE

Reply via email to