[GitHub] [spark] koertkuipers edited a comment on pull request #27986: [SPARK-31220][SQL] repartition obeys initialPartitionNum when adaptiveExecutionEnabled

GitBox Sun, 21 Jun 2020 11:08:48 -0700


koertkuipers edited a comment on pull request #27986:
URL: https://github.com/apache/spark/pull/27986#issuecomment-647160988



   i rebuild spark using this pullreq hoping to have `def 
repartition(partitionExprs: Column*)` now use adaptive execution to scale the 
number of reducers. but i see no such behavior.
   my simple test job only does 3 things:
   1. read from parquet datasource
   2. repartition by one column
   3. write to parquet datasink
   
   what i see is that the actual number of shuffle partitions is always equal 
to `spark.sql.shuffle.partitions` or to 
`spark.sql.adaptive.coalescePartitions.initialPartitionNum` if it has been set. 
but it never adapts to the data size as i would expect with adapative execution 
(which it does with a groupBy). 
   
   e.g. if i have adaptive execution enabled and i set 
`spark.sql.adaptive.coalescePartitions.initialPartitionNum=10` then all my 
`.repartition(...)` operations run with 10 reducers irrespective of data 
size... which is not a good thing.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] koertkuipers edited a comment on pull request #27986: [SPARK-31220][SQL] repartition obeys initialPartitionNum when adaptiveExecutionEnabled

Reply via email to