Re: Querying Drill with Spark DataFrame

Luqman Ghani Sat, 22 Jul 2017 14:46:49 -0700

BTW, do we have to register JdbcDialect for every Spark/SQL context, or
once for a Spark server?


On Sun, Jul 23, 2017 at 2:26 AM, Luqman Ghani <lgsa...@gmail.com> wrote:

> I have found the solution for this error. I have to register a JdbcDialect
> for Drill as mentioned in the following post on SO:
>
> https://stackoverflow.com/questions/35476076/integrating-spark-sql-and-
> apache-drill-through-jdbc
>
> Thanks
>
> On Sun, Jul 23, 2017 at 2:10 AM, Luqman Ghani <lgsa...@gmail.com> wrote:
>
>> I have done that, but Spark is encompassing my query with same clause:
>> SELECT "CustomerID", etc FROM ( my query from table) so same error.
>>
>> On Sun, Jul 23, 2017 at 2:02 AM, ayan guha <guha.a...@gmail.com> wrote:
>>
>>> You can formulate a query in dbtable clause in jdbc reader.
>>>
>>> On Sun, 23 Jul 2017 at 6:43 am, Luqman Ghani <lgsa...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm working on integrating Apache Drill with Apache Spark with Drill's
>>>> JDBC driver. I'm trying a simple select * from table from Drill through
>>>> spark.sqlContext.load via jdbc driver. I'm running the following code in
>>>> Spark Shell:
>>>>
>>>> > ./bin/spark-shell --driver-class-path 
>>>> > /home/ubuntu/dir/spark/jars/jackson-databind-2.6.5.jar
>>>> --packages org.apache.drill.exec:drill-jdbc-all:1.10.0
>>>>
>>>> scala>  val options = Map[String,String](
>>>>
>>>> "driver" -> "org.apache.drill.jdbc.Driver",
>>>>
>>>> "url" -> "jdbc:drill:drillbit=localhost:31010",
>>>>
>>>> "dbtable" -> "(SELECT * FROM dfs.root.`output.parquet`) AS Customers")
>>>>
>>>> scala> val df = spark.sqlContext.load("jdbc",options)
>>>>
>>>> scala> df.schema
>>>>
>>>> res0: org.apache.spark.sql.types.StructType =
>>>> StructType(StructField(CustomerID,IntegerType,true),
>>>>
>>>> StructField(First_name,StringType,true),
>>>>
>>>> StructField(Last_name,StringType,true),
>>>>
>>>> StructField(Email,StringType,true), StructField(Gender,StringType,
>>>> true),
>>>>
>>>> StructField(Country,StringType,true))
>>>>
>>>> It gives correct schema of DataFrame, but when I do:
>>>>
>>>> scala> df.show
>>>>
>>>> *I am facing the following error:*
>>>>
>>>> java.sql.SQLException: Failed to create prepared statement: PARSE
>>>> ERROR: *Encountered "\"" at line 1, column 23.*
>>>>
>>>> Was expecting one of:
>>>>
>>>>     "STREAM" ...
>>>>
>>>>     "DISTINCT" ...
>>>>
>>>>     "ALL" ...
>>>>
>>>>     "*" ...
>>>>
>>>>     "+" ...
>>>>
>>>>     "-" ...
>>>>
>>>>     <UNSIGNED_INTEGER_LITERAL> ...
>>>>
>>>>     __MORE_DRILL_GRAMMAR__ ...
>>>>
>>>>
>>>> SQL Query SELECT * FROM (SELECT "CustomerID","First_name","Las
>>>> t_name","Email","Gender","Country" FROM (SELECT * FROM
>>>> dfs.root.`output.parquet`) AS Customers ) LIMIT 0
>>>>
>>>> Now, the Encountered quote is at "CustomerID" in the query.
>>>>
>>>> I tried to run the following query in Drill shell:
>>>>
>>>> SELECT "CustomerID" from dfs.root.`output.parquet`;
>>>>
>>>> It gives the same error of 'Encountered "\"" '.
>>>>
>>>> I want to ask if there is any way to remove the above "SELECT
>>>> "CustomerID","First_name","Last_name","Email","Gender","Country" FROM" from
>>>> the above query formulated by Spark and pushed down to Apache Drill via
>>>> JDBC driver.
>>>>
>>>> Or any other way around like removing the Quotes?
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Luqman
>>>>
>>> --
>>> Best Regards,
>>> Ayan Guha
>>>
>>
>>
>

Re: Querying Drill with Spark DataFrame

Reply via email to