allisonwang-db opened a new pull request #30093:
URL: https://github.com/apache/spark/pull/30093


   ### What changes were proposed in this pull request?
   This PR aims to fix a correctness bug in the optimizer rule 
`EliminateSorts`. A global sort should not be eliminated even if its child is 
ordered since we don't know if its child ordering is global or local. For 
example, in the following scenario, the first sort shouldn't be removed because 
it has a stronger guarantee than the second sort even if the sort orders are 
the same for both sorts. 
   
   ```
   Sort(orders, global = True, ...)
     Sort(orders, global = False, ...)
   ```
   
   Since there is no straightforward way to identify whether a node's output 
ordering is local or global, a global sort node should only be eliminated when 
its child is ordered and is 1) another global sort or 2) a range operator.
   
   ### Why are the changes needed?
   To fix a bug in rule `EliminateSorts`.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes
   
   ### How was this patch tested?
   Unit tests
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to