GitHub user osidorkin opened a pull request:

    https://github.com/apache/spark/pull/6032

    [SPARK-7345][SQL] Spark cannot detect renamed columns using JDBC connector

    Issue appears when one tries to create DataFrame using 
sqlContext.load("jdbc"...) statement when "dbtable" contains query with renamed 
columns.
    If original column is used in SQL query once the resulting DataFrame will 
contain non-renamed column.
    If original column is used in SQL query several times with different 
aliases, sqlContext.load will fail.
    Original implementation of JDBCRDD.resolveTable uses getColumnName to 
detect column names in RDD schema.
    Suggested implementation uses getColumnLabel to handle column renames in 
SQL statement which is aware of SQL "AS" statement.
    
    Readings:
    http://stackoverflow.com/questions/4271152/getcolumnlabel-vs-getcolumnname
    
http://stackoverflow.com/questions/12259829/jdbc-getcolumnname-getcolumnlabel-db2
    
    Official documentation unfortunately a bit misleading in definition of 
"suggested title" purpose however clearly defines behavior of AS keyword in SQL 
statement.
    http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html
    getColumnLabel - Gets the designated column's suggested title for use in 
printouts and displays. The suggested title is usually specified by the SQL AS 
clause. If a SQL AS is not specified, the value returned from getColumnLabel 
will be the same as the value returned by the getColumnName method.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/osidorkin/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/6032.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #6032
    
----
commit 09559a0b3eebaaabd3bb896cb3f8ca5e4ca835bc
Author: Oleg Sidorkin <[email protected]>
Date:   2015-05-09T18:08:37Z

    [SPARK-7345][SQL] Spark cannot detect renamed columns using JDBC connector
    
    Issue appears when one tries to create DataFrame using 
sqlContext.load("jdbc"...) statement when "dbtable" contains query with renamed 
columns.
    If original column is used in SQL query once the resulting DataFrame will 
contain non-renamed column.
    If original column is used in SQL query several times with different 
aliases, sqlContext.load will fail.
    Original implementation of JDBCRDD.resolveTable uses getColumnName to 
detect column names in RDD schema.
    Suggested implementation uses getColumnLabel to handle column renames in 
SQL statement which is aware of SQL "AS" statement.
    
    Readings:
    http://stackoverflow.com/questions/4271152/getcolumnlabel-vs-getcolumnname
    
http://stackoverflow.com/questions/12259829/jdbc-getcolumnname-getcolumnlabel-db2
    
    Official documentation unfortunately a bit misleading in definition of 
"suggested title" purpose however clearly defines behavior of AS keyword in SQL 
statement.
    http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html
    getColumnLabel - Gets the designated column's suggested title for use in 
printouts and displays. The suggested title is usually specified by the SQL AS 
clause. If a SQL AS is not specified, the value returned from getColumnLabel 
will be the same as the value returned by the getColumnName method.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to