Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/16633
  
    That case only happens when the all row counts in all partitions are less 
than or (nearly) equal to the limit number. So it needs to scan (almost) all 
partitions.
    
    One possible way to deal with this case, is to use row count statistics to 
decide whether we do this global limit without shuffle, or old global limit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to