koertkuipers edited a comment on pull request #27986: URL: https://github.com/apache/spark/pull/27986#issuecomment-647160988
i rebuild spark using this pullreq hoping to have `def repartition(partitionExprs: Column*)` now use adaptive execution to scale the number of reducers. but i see no such behavior. my simple test job only does 3 things: 1. read from parquet datasource 2. repartition by one column 3. write to parquet datasink what i see is that the actual number of shuffle partitions is always equal to `spark.sql.shuffle.partitions` or to `spark.sql.adaptive.coalescePartitions.initialPartitionNum` if it has been set. but it never adapts to the data size as i would expect with adapative execution (which it does with a groupBy). e.g. if i have adaptive execution enabled and i set `spark.sql.adaptive.coalescePartitions.initialPartitionNum=10` then all my `.repartition(...)` operations run with 10 reducers irrespective of data size... which is not a good thing. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
