[GitHub] [spark] wangyum opened a new pull request #27986: [SPARK-31220][SQL] repartition obeys initialPartitionNum when adaptiveExecutionEnabled

GitBox Mon, 23 Mar 2020 01:54:46 -0700

wangyum opened a new pull request #27986: [SPARK-31220][SQL] repartition obeys 
initialPartitionNum when adaptiveExecutionEnabled
URL: https://github.com/apache/spark/pull/27986
 
 
   ### What changes were proposed in this pull request?
   This PR makes `repartition`/`DISTRIBUTE BY` obeys 
[initialPartitionNum](https://github.com/apache/spark/blob/af4248b2d661d04fec89b37857a47713246d9465/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L446-L455)
 when adaptive execution enabled.
   
   
   ### Why are the changes needed?
   To make `DISTRIBUTE BY`/`GROUP BY` partitioned by same partition number.
   How to reproduce:
   ```scala
   spark.sql("CREATE TABLE spark_31220(id int)")
   spark.sql("set spark.sql.adaptive.enabled=true")
   spark.sql("set 
spark.sql.adaptive.coalescePartitions.initialPartitionNum=1000")
   ```
   
   Before this PR:
   ```
   scala> spark.sql("SELECT id from spark_31220 GROUP BY id").explain
   == Physical Plan ==
   AdaptiveSparkPlan(isFinalPlan=false)
   +- HashAggregate(keys=[id#5], functions=[])
      +- Exchange hashpartitioning(id#5, 1000), true, [id=#171]
         +- HashAggregate(keys=[id#5], functions=[])
            +- FileScan parquet default.spark_31220[id#5]
   
   scala> spark.sql("SELECT id from spark_31220 DISTRIBUTE BY id").explain
   == Physical Plan ==
   AdaptiveSparkPlan(isFinalPlan=false)
   +- Exchange hashpartitioning(id#5, 200), false, [id=#179]
      +- FileScan parquet default.spark_31220[id#5]
   ```
   After this PR:
   ```
   scala> spark.sql("SELECT id from spark_31220 GROUP BY id").explain
   == Physical Plan ==
   AdaptiveSparkPlan(isFinalPlan=false)
   +- HashAggregate(keys=[id#5], functions=[])
      +- Exchange hashpartitioning(id#5, 1000), true, [id=#171]
         +- HashAggregate(keys=[id#5], functions=[])
            +- FileScan parquet default.spark_31220[id#5]
   
   scala> spark.sql("SELECT id from spark_31220 DISTRIBUTE BY id").explain
   == Physical Plan ==
   AdaptiveSparkPlan(isFinalPlan=false)
   +- Exchange hashpartitioning(id#5, 1000), false, [id=#179]
      +- FileScan parquet default.spark_31220[id#5]
   ```
   
   ### Does this PR introduce any user-facing change?
   No.
   
   
   ### How was this patch tested?
   Unit test.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] wangyum opened a new pull request #27986: [SPARK-31220][SQL] repartition obeys initialPartitionNum when adaptiveExecutionEnabled

Reply via email to