Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/10689#issuecomment-170597130 @gatorsmile I do see the performance benefits of ```limit``` while processing. The reservation I am having is reasoning about non-toplevel ```limit``` statements. A set-operator example: select a from db.tbl_a intersect select b from db.tbl_b The result should all distinct rows in ```a``` for which we can find an equal tuple in ```b```. Let's add limit to this: select a from db.tbl_a limit 10 intersect select b from db.tbl_b limit 10 The result now be the first (distinct?) 10 rows from ```a``` which will be filtered by checking if they exist in the first 10 rows of ```b``` (I think). I am not sure this is what a user expects, further more: - You will probably end up with less then 10 rows here. - The results will be probably non-deterministic (unless you would also allow somekind of ordering in a subquery). Do you have a concrete realworld example where you need this? I don't really mind if we would put this back in the parser (the engine supports it anyway). But I don't think we should just do something like this without some consideration.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org