allisonwang-db opened a new pull request #30195:
URL: https://github.com/apache/spark/pull/30195
Backport #30093 for branch-3.0. I've updated the configuration version to
2.4.8.
### What changes were proposed in this pull request?
This PR aims to fix a correctness bug in the optimizer rule EliminateSorts.
It also adds a new physical rule to remove redundant sorts that cannot be
eliminated in the Optimizer rule after the bugfix.
### Why are the changes needed?
A global sort should not be eliminated even if its child is ordered since we
don't know if its child ordering is global or local. For example, in the
following scenario, the first sort shouldn't be removed because it has a
stronger guarantee than the second sort even if the sort orders are the same
for both sorts.
```
Sort(orders, global = True, ...)
Sort(orders, global = False, ...)
```
Since there is no straightforward way to identify whether a node's output
ordering is local or global, we should not remove a global sort even if its
child is already ordered.
### Does this PR introduce any user-facing change?
Yes
### How was this patch tested?
Unit tests
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]