stczwd commented on issue #26946: [SPARK-30036][SQL] Fix: REPARTITION hint does not work with order by URL: https://github.com/apache/spark/pull/26946#issuecomment-571034381 > Yes it is. But it is similar with outer join. `df.join(df.repartition(10), Seq("id"), "left")` result the 200 partitions, and now `df.repartition(10).sort("id")` result the 10 partitions. Should they be same ? > Sorry for the wrong example. I mean user should use the right way to change partition. Obviously df.repartition(10).sort("id") should return the spark sql shuffle partitions. Thanks for pay attention on this. The main problem you described is whether we should change partition num for OrderedDistribution. Hm, it's you add `REPARTITION` hint in [SPARK-28746](https://github.com/apache/spark/pull/25464), you may know what it means to users. In other cases, `REPARTITION` hint will change result partition number with shuffles, but it didn't work with order by, which confused users. `REPARTITION` is a great way for users to control final result num, we should keep it works on every queries. Besides, `sort("id")` is a global OrderDistribution, which usually generate the final result. It is not easy to set partition number with defaultShufflePartitions, especially on large queries with multiple shuffles. Finally, changing partition num is good way for use to control shuffle and final results with `df.repartition(10).sort("id")`. Users may won't write `df.repartition(10).sort("id")` unless they want change the final partition num. It is not a normal case in other scenes. Correct me if I'm wrong. Thanks
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
