[GitHub] spark pull request: [SPARK-12705] [SQL] Analyzer Rule ResolveSortR...

davies Tue, 19 Jan 2016 10:47:14 -0800

Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/10678#issuecomment-172947664
  
    @gatorsmile I took a pass, it's correct overall. My only concern is that 
it's too powerful that could introduce some side effects that we don't want. 
For example:
    ```
     df.select('a).limit(10).orderBy('b)
    ```
    We may not want support these complicated use case. With DataFrame API, we 
could easily construct a plan as we want.
    
    Without subquery, SQL query can only have one SELECT clause, it's 
reasonable to support that (at least other database support that), or you have 
to use subquery.
    
    Here is the cases that we should support:
    ```
    SELECT project_list FROM tables GROUP BY keys HAVING condition ORDER BY 
ordering LIMIT N
    ```
    
    The columns used in HAVING and ORDER BY could be not presented in SELECT, 
we should still resolve them. 
    
    Between Sort and Join/Scan, it could have Project/Filter/Window/Aggregate, 
no others.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-12705] [SQL] Analyzer Rule ResolveSortR...

Reply via email to