[GitHub] spark pull request #14616: [SPARK-16955][SQL] Fix analysis error when using ...

clockfly Thu, 11 Aug 2016 20:06:07 -0700

GitHub user clockfly opened a pull request:

    https://github.com/apache/spark/pull/14616


    [SPARK-16955][SQL] Fix analysis error when using ordinal in ORDER BY or 
GROUP BY

    ## What changes were proposed in this pull request?
    
    This PR adds two unresolved expressions to represent the ordinal in GROUP 
BY or ORDER BY `GroupByOrdinal` and `OrderByOrdinal`, and fixes the rules when 
resolving ordinals.
    
    Ordinals in GROUP BY or ORDER BY like `1` in `order by 1` or `group by 1` 
should be considered as unresolved expressions before analysis. But in current 
code, it is represented as a `Literal` expression directly, which is a resolved 
expression. It may cause analysis failure if a rule requires the ordinal to be 
resolved before applying.
    
    **For example:**
    
    Before this fix, rule `ResolveAggregateFunctions` will try to resolve the 
`Filter` before `Filter`'s child `Aggregate` is fully resolved (`Aggregate` 
contains an unresolved group by ordinal `2`) 
    
    ```
    'Filter ('a > 0)
       +- Aggregate [2], [count(1) AS count(1)#83L, a#81]
            +- SubqueryAlias tmp
                +- Project [1 AS a#81]
                     +- OneRowRelation$
    ```
    
    ### Before this change
    
    Ordinal is stored as `Literal` expression
    
    ```
    scala> sc.setLogLevel("TRACE")
    scala> sql("select a from t group by 1 order by 1")
    ...
    'Sort [1 ASC], true  
     +- 'Aggregate [1], ['a]
         +- 'UnresolvedRelation `t
    ```
    
    And it causes analysis error when applying rule ResolveAggregateFunctions, 
as group by ordinal `2` claim to have been resolved, but is not resolved 
actually.
    
    ```
    scala> Seq(1).toDF("a").createOrReplaceTempView("t")
    scala> sql("select count(a), a from t group by 2 having a > 0").show
    org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to 
Group by position: '2' exceeds the size of the select list '1'. on unresolved 
object, tree:
    Aggregate [2], [(a#9 > 0) AS havingCondition#15]
    +- SubqueryAlias t
       +- Project [value#7 AS a#9]
          +- LocalRelation [value#7]
    ...
    ```
    
    ### After this change
    
    Ordinals are stored as `GroupByOrdinal` or `OrderByOrdinal`.
    
    ```
    scala> sc.setLogLevel("TRACE")
    scala> sql("select a from t group by 1 order by 1")
    ...
    'Sort [orderbyordinal(1) ASC], true
     +- 'Aggregate [groupbyordinal(1)], ['a]
          +- 'UnresolvedRelation `t`
    ```
    
    And rule ResolveAggregateFunctions can be safely applied as we have 
explicitly resolved `GroupByOrdinal(2)` before applying this rule. 
    
    ```
    scala> Seq(1).toDF("a").createOrReplaceTempView("t")
    scala> sql("select count(a), a from t group by 2 having a > 0").show
    +--------+---+                                                              
    
    |count(a)|  a|
    +--------+---+
    |       1|  1|
    +--------+---+
    ```
    
    ## How was this patch tested?
    
    Unit tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/clockfly/spark spark-16955

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14616.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14616
    
----
commit 40873650c7397a339210092f616c15aedbf13b17
Author: Sean Zhong <[email protected]>
Date:   2016-08-08T21:40:53Z

    [SPARK-16955][SQL] Fix analysis error when using ordinal in ORDER BY or 
GROUP BY

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #14616: [SPARK-16955][SQL] Fix analysis error when using ...

Reply via email to