Arghya Saha created IMPALA-5782:
-----------------------------------

             Summary: Issue with WITH clause with Cloudera JDBC Driver for 
Impala - Returning column name instead of actual Data
                 Key: IMPALA-5782
                 URL: https://issues.apache.org/jira/browse/IMPALA-5782
             Project: IMPALA
          Issue Type: Bug
    Affects Versions: Impala 2.8.0
            Reporter: Arghya Saha


I am using Cloudera JDBC Driver for Impala v 2.5.38 with Spark 1.6.0 to create 
DataFrame. It is working fine for all queries except WITH clause, but WITH is 
extensively used in my organization. Below is my code snippet.

def jdbcHDFS(url:String,sql: String):DataFrame = {
  var rddDF: DataFrame = null
  val jdbcURL = s"jdbc:impala://$url"
  val connectionProperties = new java.util.Properties
  connectionProperties.setProperty("driver","com.cloudera.impala.jdbc41.Driver")
  rddDF = sqlContext.read.jdbc(jdbcURL, s"($sql) AS ST", connectionProperties)
  rddDF
}
Given below example for working and non-working SQL

val workingSQL = "select empname from (select * from employee) as tmp"
val nonWorkingSQL = "WITH tmp as (select * from employee) select empname from 
tmp"
Below is the output of rddDF.first for above SQLs.

For workingSQL

scala> rddDF.first
res8: org.apache.spark.sql.Row = [Kushal]
For nonWorkingSQL

scala> rddDF.first
res8: org.apache.spark.sql.Row = [empname] //Here we are expecting actual data 
ie. 'Kushal' instead of column name like the output of previous query.
It would be really helpful if anyone can suggest any solution for it.

Please note: Both the queries are working fine in IMPALA-SHELL as well as in 
HIVE through HUE.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to