hvanhovell commented on pull request #29089: URL: https://github.com/apache/spark/pull/29089#issuecomment-658612372
Ehh... AFAIK nested ordering can be ignored from a relation algebra point of view. So I am not sure this is a very solid argument. This feels a bit like an example of [hyrum's law](https://www.hyrumslaw.com/). If you want sorted runs in ORC then you ought to fix is there, and not rely on some implicit system behavior. Regarding the shuffles. If the data is sorted before it goes into the shuffle, then the individual shuffle blocks are sorted. This is also the reason why doing a sort aggregate is not completely terrible (TimSort is good at identifying sorted runs). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org