cloud-fan commented on issue #26946: [SPARK-30036][SQL] Fix: REPARTITION hint does not work with order by URL: https://github.com/apache/spark/pull/26946#issuecomment-571101291 what `.sort` should guarantee is that the output is ordered, and users shouldn't care about the number of partitions. It's more efficient to shuffle only once for query `df.repartition(10).sort("id")`. This is just an optimization and nothing about semantic. For `df.join(df.repartition(10), Seq("id"), "left")`, again we don't care about the number of result partitions. If there is a way to save shuffles, please propose.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
