GitHub user osidorkin opened a pull request:
https://github.com/apache/spark/pull/6032
[SPARK-7345][SQL] Spark cannot detect renamed columns using JDBC connector
Issue appears when one tries to create DataFrame using
sqlContext.load("jdbc"...) statement when "dbtable" contains query with renamed
columns.
If original column is used in SQL query once the resulting DataFrame will
contain non-renamed column.
If original column is used in SQL query several times with different
aliases, sqlContext.load will fail.
Original implementation of JDBCRDD.resolveTable uses getColumnName to
detect column names in RDD schema.
Suggested implementation uses getColumnLabel to handle column renames in
SQL statement which is aware of SQL "AS" statement.
Readings:
http://stackoverflow.com/questions/4271152/getcolumnlabel-vs-getcolumnname
http://stackoverflow.com/questions/12259829/jdbc-getcolumnname-getcolumnlabel-db2
Official documentation unfortunately a bit misleading in definition of
"suggested title" purpose however clearly defines behavior of AS keyword in SQL
statement.
http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html
getColumnLabel - Gets the designated column's suggested title for use in
printouts and displays. The suggested title is usually specified by the SQL AS
clause. If a SQL AS is not specified, the value returned from getColumnLabel
will be the same as the value returned by the getColumnName method.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/osidorkin/spark master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/6032.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #6032
----
commit 09559a0b3eebaaabd3bb896cb3f8ca5e4ca835bc
Author: Oleg Sidorkin <[email protected]>
Date: 2015-05-09T18:08:37Z
[SPARK-7345][SQL] Spark cannot detect renamed columns using JDBC connector
Issue appears when one tries to create DataFrame using
sqlContext.load("jdbc"...) statement when "dbtable" contains query with renamed
columns.
If original column is used in SQL query once the resulting DataFrame will
contain non-renamed column.
If original column is used in SQL query several times with different
aliases, sqlContext.load will fail.
Original implementation of JDBCRDD.resolveTable uses getColumnName to
detect column names in RDD schema.
Suggested implementation uses getColumnLabel to handle column renames in
SQL statement which is aware of SQL "AS" statement.
Readings:
http://stackoverflow.com/questions/4271152/getcolumnlabel-vs-getcolumnname
http://stackoverflow.com/questions/12259829/jdbc-getcolumnname-getcolumnlabel-db2
Official documentation unfortunately a bit misleading in definition of
"suggested title" purpose however clearly defines behavior of AS keyword in SQL
statement.
http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html
getColumnLabel - Gets the designated column's suggested title for use in
printouts and displays. The suggested title is usually specified by the SQL AS
clause. If a SQL AS is not specified, the value returned from getColumnLabel
will be the same as the value returned by the getColumnName method.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]