HeartSaVioR edited a comment on pull request #31355: URL: https://github.com/apache/spark/pull/31355#issuecomment-768801016
> AQE won't kick in if users specify num partitions, e.g. df.repartition(5), I think the same applies here if the sink requires a certain num partitions. The case I imagine is like this, `df.repartition(5).write.(blabla).save()`, with the condition sink requires a specific distribution but doesn't require specific number of partitions. Now Spark would repartition to default number of shuffle partitions, and I'm unsure changing it to None (SPARK-34230) would keep the user intention. There's a glitch where end users would know the good number of partitions but don't know about the requirement of distribution/ordering for the sink, so end users can't add the same repartition (but with number of partitions) sink will require to Spark and see the opportunity Spark will deduplicate two neighbor repartitions. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
