ulysses-you commented on PR #37284:
URL: https://github.com/apache/spark/pull/37284#issuecomment-1196193599

   As I mentioned at pr description, the output ordering of logical global 
limit depend on the certain physical plan. Let's say the strategy of 
GlobalLimitExec:
   
   GlobalLimitExec use child output ordering as it's output ordering, besides 
it requires distribution Alltuples. So here are two cases:
   1. the output partition of child is not single partition which does not 
satisfy the required, then the plan must be:
   ```
   GlobalLimitExec
     ShuffleExchangeExec
       child exec
   ```
   Due to the output ordering of ShuffleExchangeExec is unkown, the 
GlobalLimitExec output ordering is unkown. This is a key that the 
GlobalLimitExec use the output ordering of ShuffleExchangeExec rather than the 
original child.
   
   For this case, the output ordering of logical global limit is broken. Then 
it can easy reproduce by add a extra local sort at top of global limit. An 
example at https://github.com/apache/spark/pull/37284#issuecomment-1195169526
   
   2. the child's output partition is single partition which satisfies the 
required, then the plan must be:
   ```
   GlobalLimitExec
     child exec
   ```
   For this case, the output ordering of GlobalLimitExec is same with the 
original child. The output ordering of logical global limit  works.
   
   
   I think that's why @viirya think the output ordering at logical plan is 
flaky. I can understand that it comes from the history, we first have the 
`EliminateSorts` at logical side then have the `RemoveRedundantSorts` at 
physical side. I think it's not a big problem as it has existed, and it can 
optimize out the unnecessary sort eargly before going to physical plan. So I 
just fix the bug and leave it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to