Querying Drill with Spark DataFrame

Luqman Ghani Sat, 22 Jul 2017 13:44:19 -0700

Hi,

I'm working on integrating Apache Drill with Apache Spark with Drill's JDBC
driver. I'm trying a simple select * from table from Drill through
spark.sqlContext.load via jdbc driver. I'm running the following code in
Spark Shell:


> ./bin/spark-shell --driver-class-path
/home/ubuntu/dir/spark/jars/jackson-databind-2.6.5.jar --packages
org.apache.drill.exec:drill-jdbc-all:1.10.0

scala>  val options = Map[String,String](

"driver" -> "org.apache.drill.jdbc.Driver",

"url" -> "jdbc:drill:drillbit=localhost:31010",

"dbtable" -> "(SELECT * FROM dfs.root.`output.parquet`) AS Customers")

scala> val df = spark.sqlContext.load("jdbc",options)

scala> df.schema

res0: org.apache.spark.sql.types.StructType =
StructType(StructField(CustomerID,IntegerType,true),

StructField(First_name,StringType,true),

StructField(Last_name,StringType,true),

StructField(Email,StringType,true), StructField(Gender,StringType,true),

StructField(Country,StringType,true))

It gives correct schema of DataFrame, but when I do:

scala> df.show

*I am facing the following error:*

java.sql.SQLException: Failed to create prepared statement: PARSE
ERROR: *Encountered
"\"" at line 1, column 23.*

Was expecting one of:

    "STREAM" ...

    "DISTINCT" ...

    "ALL" ...

    "*" ...

    "+" ...

    "-" ...

    <UNSIGNED_INTEGER_LITERAL> ...

    __MORE_DRILL_GRAMMAR__ ...


SQL Query SELECT * FROM (SELECT
"CustomerID","First_name","Last_name","Email","Gender","Country" FROM
(SELECT * FROM dfs.root.`output.parquet`) AS Customers ) LIMIT 0

Now, the Encountered quote is at "CustomerID" in the query.

I tried to run the following query in Drill shell:

SELECT "CustomerID" from dfs.root.`output.parquet`;

It gives the same error of 'Encountered "\"" '.

I want to ask if there is any way to remove the above "SELECT
"CustomerID","First_name","Last_name","Email","Gender","Country" FROM" from
the above query formulated by Spark and pushed down to Apache Drill via
JDBC driver.

Or any other way around like removing the Quotes?


Thanks,

Luqman

Querying Drill with Spark DataFrame

Reply via email to