ulysses-you commented on PR #37284:
URL: https://github.com/apache/spark/pull/37284#issuecomment-1196193599
As I mentioned at pr description, the output ordering of logical global
limit depend on the certain physical plan. Let's say the strategy of
GlobalLimitExec:
GlobalLimitExec use child output ordering as it's output ordering, besides
it requires distribution Alltuples. So here are two cases:
1. the output partition of child is not single partition which does not
satisfy the required, then the plan must be:
```
GlobalLimitExec
ShuffleExchangeExec
child exec
```
Due to the output ordering of ShuffleExchangeExec is unkown, the
GlobalLimitExec output ordering is unkown. This is a key that the
GlobalLimitExec use the output ordering of ShuffleExchangeExec rather than the
original child.
For this case, the output ordering of logical global limit is broken. Then
it can easy reproduce by add a extra local sort at top of global limit. An
example at https://github.com/apache/spark/pull/37284#issuecomment-1195169526
2. the child's output partition is single partition which satisfies the
required, then the plan must be:
```
GlobalLimitExec
child exec
```
For this case, the output ordering of GlobalLimitExec is same with the
original child. The output ordering of logical global limit works.
I think that's why @viirya think the output ordering at logical plan is
flaky. I can understand that it comes from the history, we first have the
`EliminateSorts` at logical side then have the `RemoveRedundantSorts` at
physical side. I think it's not a big problem as it has existed, and it can
optimize out the unnecessary sort eargly before going to physical plan. So I
just fix the bug and leave it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]