GitHub user clockfly opened a pull request:
https://github.com/apache/spark/pull/14616
[SPARK-16955][SQL] Fix analysis error when using ordinal in ORDER BY or
GROUP BY
## What changes were proposed in this pull request?
This PR adds two unresolved expressions to represent the ordinal in GROUP
BY or ORDER BY `GroupByOrdinal` and `OrderByOrdinal`, and fixes the rules when
resolving ordinals.
Ordinals in GROUP BY or ORDER BY like `1` in `order by 1` or `group by 1`
should be considered as unresolved expressions before analysis. But in current
code, it is represented as a `Literal` expression directly, which is a resolved
expression. It may cause analysis failure if a rule requires the ordinal to be
resolved before applying.
**For example:**
Before this fix, rule `ResolveAggregateFunctions` will try to resolve the
`Filter` before `Filter`'s child `Aggregate` is fully resolved (`Aggregate`
contains an unresolved group by ordinal `2`)
```
'Filter ('a > 0)
+- Aggregate [2], [count(1) AS count(1)#83L, a#81]
+- SubqueryAlias tmp
+- Project [1 AS a#81]
+- OneRowRelation$
```
### Before this change
Ordinal is stored as `Literal` expression
```
scala> sc.setLogLevel("TRACE")
scala> sql("select a from t group by 1 order by 1")
...
'Sort [1 ASC], true
+- 'Aggregate [1], ['a]
+- 'UnresolvedRelation `t
```
And it causes analysis error when applying rule ResolveAggregateFunctions,
as group by ordinal `2` claim to have been resolved, but is not resolved
actually.
```
scala> Seq(1).toDF("a").createOrReplaceTempView("t")
scala> sql("select count(a), a from t group by 2 having a > 0").show
org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to
Group by position: '2' exceeds the size of the select list '1'. on unresolved
object, tree:
Aggregate [2], [(a#9 > 0) AS havingCondition#15]
+- SubqueryAlias t
+- Project [value#7 AS a#9]
+- LocalRelation [value#7]
...
```
### After this change
Ordinals are stored as `GroupByOrdinal` or `OrderByOrdinal`.
```
scala> sc.setLogLevel("TRACE")
scala> sql("select a from t group by 1 order by 1")
...
'Sort [orderbyordinal(1) ASC], true
+- 'Aggregate [groupbyordinal(1)], ['a]
+- 'UnresolvedRelation `t`
```
And rule ResolveAggregateFunctions can be safely applied as we have
explicitly resolved `GroupByOrdinal(2)` before applying this rule.
```
scala> Seq(1).toDF("a").createOrReplaceTempView("t")
scala> sql("select count(a), a from t group by 2 having a > 0").show
+--------+---+
|count(a)| a|
+--------+---+
| 1| 1|
+--------+---+
```
## How was this patch tested?
Unit tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/clockfly/spark spark-16955
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/14616.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #14616
----
commit 40873650c7397a339210092f616c15aedbf13b17
Author: Sean Zhong <[email protected]>
Date: 2016-08-08T21:40:53Z
[SPARK-16955][SQL] Fix analysis error when using ordinal in ORDER BY or
GROUP BY
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]