GitHub user yhuai opened a pull request:
https://github.com/apache/spark/pull/1317
[SPARK-2339][SQL] SQL parser in sql-core is case sensitive, but a table
alias is converted to lower case when we create Subquery
Reported by
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Join-throws-exception-td8599.html
After we get the table from the catalog, because the table has an alias, we
will temporarily insert a Subquery. Then, we convert the table alias to lower
case no matter if the parser is case sensitive or not.
To see the issue ...
```
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.createSchemaRDD
case class Person(name: String, age: Int)
val people =
sc.textFile("examples/src/main/resources/people.txt").map(_.split(",")).map(p
=> Person(p(0), p(1).trim.toInt))
people.registerAsTable("people")
sqlContext.sql("select PEOPLE.name from people PEOPLE")
```
The plan is ...
```
== Query Plan ==
Project ['PEOPLE.name]
ExistingRdd [name#0,age#1], MapPartitionsRDD[4] at mapPartitions at
basicOperators.scala:176
```
You can find that `PEOPLE.name` is not resolved.
This PR introduces three changes.
1. If a table has an alias, the catalog will not lowercase the alias. If a
lowercase alias is needed, the analyzer will do the work.
2. A catalog has a new val caseSensitive that indicates if this catalog is
case sensitive or not. For example, a SimpleCatalog is case sensitive, but
3. Corresponding unit tests.
With this PR, case sensitivity of database names and table names is handled
by the catalog. Case sensitivity of other identifiers are handled by the
analyzer.
JIRA: https://issues.apache.org/jira/browse/SPARK-2339
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/yhuai/spark SPARK-2339
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/1317.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1317
----
commit 12d8006f738e299a08621c382bef4a0a23a72b6f
Author: Yin Huai <[email protected]>
Date: 2014-07-07T16:55:59Z
Handling case sensitivity correctly.
This patch introduces three changes.
1. If a table has an alias, the catalog will not lowercase the alias. If a
lowercase alias is needed, the analyzer will do the work.
2. A catalog has a new val caseSensitive that indicates if this catalog is
case sensitive or not. For example, a SimpleCatalog is case sensitive, but
3. Corresponding unit tests.
With this patch, case sensitivity of database names and table names is
handled by the catalog. Case sensitivity of other identifiers is handled by the
analyzer.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---