stczwd commented on issue #26946: [SPARK-30036][SQL] Fix: REPARTITION hint does 
not work with order by
URL: https://github.com/apache/spark/pull/26946#issuecomment-571034381
 
 
   > Yes it is. But it is similar with outer join. `df.join(df.repartition(10), 
Seq("id"), "left")` result the 200 partitions, and now 
`df.repartition(10).sort("id")` result the 10 partitions. Should they be same ?
   > Sorry for the wrong example. I mean user should use the right way to 
change partition. Obviously df.repartition(10).sort("id") should return the 
spark sql shuffle partitions.
   
   Thanks for pay attention on this. The main problem you described is whether 
we should change partition num for OrderedDistribution. 
   Hm, it's you add `REPARTITION` hint in 
[SPARK-28746](https://github.com/apache/spark/pull/25464), you may know what it 
means to users. In other cases, `REPARTITION` hint will change result partition 
number with shuffles, but it didn't work with order by, which confused users. 
`REPARTITION` is a great way for users to control final result num, we should 
keep it works on every queries.
   Besides, `sort("id")` is a global OrderDistribution, which usually generate 
the final result. It is not easy to set partition number with 
defaultShufflePartitions, especially on large queries with multiple shuffles. 
   Finally, changing partition num is good way for use to control shuffle and 
final results with `df.repartition(10).sort("id")`. Users may won't write 
`df.repartition(10).sort("id")` unless they want change the final partition 
num. It is not a normal case in other scenes.
   
   Correct me if I'm wrong. Thanks

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to