cloud-fan commented on issue #26946: [SPARK-30036][SQL] Fix: REPARTITION hint 
does not work with order by
URL: https://github.com/apache/spark/pull/26946#issuecomment-571101291
 
 
   what `.sort` should guarantee is that the output is ordered, and users 
shouldn't care about the number of partitions.
   
   It's more efficient to shuffle only once for query 
`df.repartition(10).sort("id")`. This is just an optimization and nothing about 
semantic.
   
   For `df.join(df.repartition(10), Seq("id"), "left")`, again we don't care 
about the number of result partitions. If there is a way to save shuffles, 
please propose.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to