Re: Unable to write data into hive table using Spark via Hive JDBC driver Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED

Daniel de Oliveira Mantovani Tue, 20 Jul 2021 09:34:28 -0700

>From the Cloudera Documentation:
https://docs.cloudera.com/documentation/other/connectors/hive-jdbc/latest/Cloudera-JDBC-Driver-for-Apache-Hive-Install-Guide.pdf


UseNativeQuery
 1: The driver does not transform the queries emitted by applications, so
the native query is used.
 0: The driver transforms the queries emitted by applications and converts
them into an equivalent form in HiveQL.


Try to change the "NativeQuery" parameter and see if it works :)

On Tue, Jul 20, 2021 at 1:26 PM Daniel de Oliveira Mantovani <
daniel.oliveira.mantov...@gmail.com> wrote:

> Insert mode is "overwrite", it shouldn't doesn't matter if the table
> already exists or not. The JDBC driver should be based on the Cloudera Hive
> version, we can't know the CDH version he's using.
>
> On Tue, Jul 20, 2021 at 1:21 PM Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> The driver is fine and latest and  it should work.
>>
>> I have asked the thread owner to send the DDL of the table and how the
>> table is created. In this case JDBC from Spark expects the table to be
>> there.
>>
>> The error below
>>
>> java.sql.SQLException: [Cloudera][HiveJDBCDriver](500051) ERROR
>> processing query/statement. Error Code: 40000, SQL state:
>> TStatus(statusCode:ERROR_STATUS,
>> infoMessages:[*org.apache.hive.service.cli.HiveSQLException:Error while
>> compiling statement: FAILED: ParseException line 1:39 cannot recognize
>> input near '"first_name"' 'TEXT' ',' in column name or primary key or
>> foreign key:28:27,
>> org.apache.hive.service.cli.operation.Operation:toSQLException:Operation.java:329
>>
>> Sounds like a mismatch between the columns through Spark Dataframe and
>> the underlying table.
>>
>> HTH
>>
>>
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Tue, 20 Jul 2021 at 17:05, Daniel de Oliveira Mantovani <
>> daniel.oliveira.mantov...@gmail.com> wrote:
>>
>>> Badrinath is trying to write to a Hive in a cluster where he doesn't
>>> have permission to submit spark jobs, he doesn't have Hive/Spark metadata
>>> access.
>>> The only way to communicate with this third-party Hive cluster is
>>> through JDBC protocol.
>>>
>>> [ Cloudera Data Hub - Hive Server] <-> [Spark Standalone]
>>>
>>> Who's creating this table is "Spark" because he's using "overwrite" in
>>> order to test it.
>>>
>>>  df.write
>>>   .format("jdbc")
>>>   .option("url",
>>> "jdbc:hive2://localhost:10000/foundation;AuthMech=2;UseNativeQuery=0")
>>>   .option("dbtable", "test.test")
>>>   .option("user", "admin")
>>>   .option("password", "admin")
>>>   .option("driver", "com.cloudera.hive.jdbc41.HS2Driver")
>>> *  .mode("overwrite")*
>>>   .save
>>>
>>> This error is weird, looks like the third-party Hive server isn't able
>>> to recognize the SQL dialect coming from  [Spark Standalone] server
>>> JDBC driver.
>>>
>>> 1) I would try to execute the create statement manually in this server
>>> 2) if works try to run again with "append" option
>>>
>>> I would open a case with Cloudera and ask which driver you should use.
>>>
>>> Thanks
>>>
>>>
>>>
>>> On Mon, Jul 19, 2021 at 10:33 AM Artemis User <arte...@dtechspace.com>
>>> wrote:
>>>
>>>> As Mich mentioned, no need to use jdbc API, using the DataFrameWriter's
>>>> saveAsTable method is the way to go.   JDBC Driver is for a JDBC client (a
>>>> Java client for instance) to access the Hive tables in Spark via the Thrift
>>>> server interface.
>>>>
>>>> -- ND
>>>>
>>>> On 7/19/21 2:42 AM, Badrinath Patchikolla wrote:
>>>>
>>>> I have trying to create table in hive from spark itself,
>>>>
>>>> And using local mode it will work what I am trying here is from spark
>>>> standalone I want to create the manage table in hive (another spark cluster
>>>> basically CDH) using jdbc mode.
>>>>
>>>> When I try that below are the error I am facing.
>>>>
>>>> On Thu, 15 Jul, 2021, 9:55 pm Mich Talebzadeh, <
>>>> mich.talebza...@gmail.com> wrote:
>>>>
>>>>> Have you created that table in Hive or are you trying to create it
>>>>> from Spark itself.
>>>>>
>>>>> You Hive is local. In this case you don't need a JDBC connection. Have
>>>>> you tried:
>>>>>
>>>>> df2.write.mode("overwrite").saveAsTable(mydb.mytable)
>>>>>
>>>>> HTH
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>    view my Linkedin profile
>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>
>>>>>
>>>>>
>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>>> any loss, damage or destruction of data or any other property which may
>>>>> arise from relying on this email's technical content is explicitly
>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>> arising from such loss, damage or destruction.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, 15 Jul 2021 at 12:51, Badrinath Patchikolla <
>>>>> pbadrinath1...@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Trying to write data in spark to the hive as JDBC mode below  is the
>>>>>> sample code:
>>>>>>
>>>>>> spark standalone 2.4.7 version
>>>>>>
>>>>>> 21/07/15 08:04:07 WARN util.NativeCodeLoader: Unable to load
>>>>>> native-hadoop library for your platform... using builtin-java classes 
>>>>>> where
>>>>>> applicable
>>>>>> Setting default log level to "WARN".
>>>>>> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
>>>>>> setLogLevel(newLevel).
>>>>>> Spark context Web UI available at http://localhost:4040
>>>>>> Spark context available as 'sc' (master = spark://localhost:7077, app
>>>>>> id = app-20210715080414-0817).
>>>>>> Spark session available as 'spark'.
>>>>>> Welcome to
>>>>>>       ____              __
>>>>>>      / __/__  ___ _____/ /__
>>>>>>     _\ \/ _ \/ _ `/ __/  '_/
>>>>>>    /___/ .__/\_,_/_/ /_/\_\   version 2.4.7
>>>>>>       /_/
>>>>>>
>>>>>> Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_292)
>>>>>> Type in expressions to have them evaluated.
>>>>>> Type :help for more information.
>>>>>>
>>>>>> scala> :paste
>>>>>> // Entering paste mode (ctrl-D to finish)
>>>>>>
>>>>>> val df = Seq(
>>>>>>     ("John", "Smith", "London"),
>>>>>>     ("David", "Jones", "India"),
>>>>>>     ("Michael", "Johnson", "Indonesia"),
>>>>>>     ("Chris", "Lee", "Brazil"),
>>>>>>     ("Mike", "Brown", "Russia")
>>>>>>   ).toDF("first_name", "last_name", "country")
>>>>>>
>>>>>>
>>>>>>  df.write
>>>>>>   .format("jdbc")
>>>>>>   .option("url",
>>>>>> "jdbc:hive2://localhost:10000/foundation;AuthMech=2;UseNativeQuery=0")
>>>>>>   .option("dbtable", "test.test")
>>>>>>   .option("user", "admin")
>>>>>>   .option("password", "admin")
>>>>>>   .option("driver", "com.cloudera.hive.jdbc41.HS2Driver")
>>>>>>   .mode("overwrite")
>>>>>>   .save
>>>>>>
>>>>>>
>>>>>> // Exiting paste mode, now interpreting.
>>>>>>
>>>>>> java.sql.SQLException: [Cloudera][HiveJDBCDriver](500051) ERROR
>>>>>> processing query/statement. Error Code: 40000, SQL state:
>>>>>> TStatus(statusCode:ERROR_STATUS,
>>>>>> infoMessages:[*org.apache.hive.service.cli.HiveSQLException:Error while
>>>>>> compiling statement: FAILED: ParseException line 1:39 cannot recognize
>>>>>> input near '"first_name"' 'TEXT' ',' in column name or primary key or
>>>>>> foreign key:28:27,
>>>>>> org.apache.hive.service.cli.operation.Operation:toSQLException:Operation.java:329,
>>>>>> org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:207,
>>>>>> org.apache.hive.service.cli.operation.SQLOperation:runInternal:SQLOperation.java:290,
>>>>>> org.apache.hive.service.cli.operation.Operation:run:Operation.java:260,
>>>>>> org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementInternal:HiveSessionImpl.java:504,
>>>>>> org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementAsync:HiveSessionImpl.java:490,
>>>>>> sun.reflect.GeneratedMethodAccessor13:invoke::-1,
>>>>>> sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43,
>>>>>> java.lang.reflect.Method:invoke:Method.java:498,
>>>>>> org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78,
>>>>>> org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36,
>>>>>> org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63,
>>>>>> java.security.AccessController:doPrivileged:AccessController.java:-2,
>>>>>> javax.security.auth.Subject:doAs:Subject.java:422,
>>>>>> org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1875,
>>>>>> org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59,
>>>>>> com.sun.proxy.$Proxy35:executeStatementAsync::-1,
>>>>>> org.apache.hive.service.cli.CLIService:executeStatementAsync:CLIService.java:295,
>>>>>> org.apache.hive.service.cli.thrift.ThriftCLIService:ExecuteStatement:ThriftCLIService.java:507,
>>>>>> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1437,
>>>>>> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1422,
>>>>>> org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39,
>>>>>> org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39,
>>>>>> org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56,
>>>>>> org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286,
>>>>>> java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1149,
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:624,
>>>>>> java.lang.Thread:run:Thread.java:748,
>>>>>> *org.apache.hadoop.hive.ql.parse.ParseException:line 1:39 cannot 
>>>>>> recognize
>>>>>> input near '"first_name"' 'TEXT' ',' in column name or primary key or
>>>>>> foreign key:33:6,
>>>>>> org.apache.hadoop.hive.ql.parse.ParseDriver:parse:ParseDriver.java:221,
>>>>>> org.apache.hadoop.hive.ql.parse.ParseUtils:parse:ParseUtils.java:75,
>>>>>> org.apache.hadoop.hive.ql.parse.ParseUtils:parse:ParseUtils.java:68,
>>>>>> org.apache.hadoop.hive.ql.Driver:compile:Driver.java:564,
>>>>>> org.apache.hadoop.hive.ql.Driver:compileInternal:Driver.java:1425,
>>>>>> org.apache.hadoop.hive.ql.Driver:compileAndRespond:Driver.java:1398,
>>>>>> org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:205],
>>>>>> sqlState:42000, errorCode:40000, errorMessage:Error while compiling
>>>>>> statement: FAILED: ParseException line 1:39 cannot recognize input near
>>>>>> '"first_name"' 'TEXT' ',' in column name or primary key or foreign key),
>>>>>> Query: CREATE TABLE test.test("first_name" TEXT , "last_name" TEXT ,
>>>>>> "country" TEXT ).
>>>>>>   at
>>>>>> com.cloudera.hiveserver2.hivecommon.api.HS2Client.executeStatementInternal(Unknown
>>>>>> Source)
>>>>>>   at
>>>>>> com.cloudera.hiveserver2.hivecommon.api.HS2Client.executeStatement(Unknown
>>>>>> Source)
>>>>>>   at
>>>>>> com.cloudera.hiveserver2.hivecommon.dataengine.HiveJDBCNativeQueryExecutor.executeHelper(Unknown
>>>>>> Source)
>>>>>>   at
>>>>>> com.cloudera.hiveserver2.hivecommon.dataengine.HiveJDBCNativeQueryExecutor.execute(Unknown
>>>>>> Source)
>>>>>>   at
>>>>>> com.cloudera.hiveserver2.jdbc.common.SStatement.executeNoParams(Unknown
>>>>>> Source)
>>>>>>   at
>>>>>> com.cloudera.hiveserver2.jdbc.common.SStatement.executeAnyUpdate(Unknown
>>>>>> Source)
>>>>>>   at
>>>>>> com.cloudera.hiveserver2.jdbc.common.SStatement.executeUpdate(Unknown
>>>>>> Source)
>>>>>>   at
>>>>>> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createTable(JdbcUtils.scala:863)
>>>>>>   at
>>>>>> org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:81)
>>>>>>   at
>>>>>> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
>>>>>>   at
>>>>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>>>>>>   at
>>>>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>>>>>>   at
>>>>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
>>>>>>   at
>>>>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>>>>>>   at
>>>>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>>>>>>   at
>>>>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
>>>>>>   at
>>>>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>>>>>>   at
>>>>>> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
>>>>>>   at
>>>>>> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
>>>>>>   at
>>>>>> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
>>>>>>   at
>>>>>> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81)
>>>>>>   at
>>>>>> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
>>>>>>   at
>>>>>> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
>>>>>>   at
>>>>>> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
>>>>>>   at
>>>>>> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
>>>>>>   at
>>>>>> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
>>>>>>   at
>>>>>> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:696)
>>>>>>   at
>>>>>> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:305)
>>>>>>   at
>>>>>> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:291)
>>>>>>   ... 48 elided
>>>>>> Caused by:
>>>>>> com.cloudera.hiveserver2.support.exceptions.GeneralException:
>>>>>> [Cloudera][HiveJDBCDriver](500051) ERROR processing query/statement. 
>>>>>> Error
>>>>>> Code: 40000, SQL state: TStatus(statusCode:ERROR_STATUS,
>>>>>> infoMessages:[*org.apache.hive.service.cli.HiveSQLException:Error while
>>>>>> compiling statement: FAILED: ParseException line 1:39 cannot recognize
>>>>>> input near '"first_name"' 'TEXT' ',' in column name or primary key or
>>>>>> foreign key:28:27,
>>>>>> org.apache.hive.service.cli.operation.Operation:toSQLException:Operation.java:329,
>>>>>> org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:207,
>>>>>> org.apache.hive.service.cli.operation.SQLOperation:runInternal:SQLOperation.java:290,
>>>>>> org.apache.hive.service.cli.operation.Operation:run:Operation.java:260,
>>>>>> org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementInternal:HiveSessionImpl.java:504,
>>>>>> org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementAsync:HiveSessionImpl.java:490,
>>>>>> sun.reflect.GeneratedMethodAccessor13:invoke::-1,
>>>>>> sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43,
>>>>>> java.lang.reflect.Method:invoke:Method.java:498,
>>>>>> org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78,
>>>>>> org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36,
>>>>>> org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63,
>>>>>> java.security.AccessController:doPrivileged:AccessController.java:-2,
>>>>>> javax.security.auth.Subject:doAs:Subject.java:422,
>>>>>> org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1875,
>>>>>> org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59,
>>>>>> com.sun.proxy.$Proxy35:executeStatementAsync::-1,
>>>>>> org.apache.hive.service.cli.CLIService:executeStatementAsync:CLIService.java:295,
>>>>>> org.apache.hive.service.cli.thrift.ThriftCLIService:ExecuteStatement:ThriftCLIService.java:507,
>>>>>> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1437,
>>>>>> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1422,
>>>>>> org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39,
>>>>>> org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39,
>>>>>> org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56,
>>>>>> org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286,
>>>>>> java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1149,
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:624,
>>>>>> java.lang.Thread:run:Thread.java:748,
>>>>>> *org.apache.hadoop.hive.ql.parse.ParseException:line 1:39 cannot 
>>>>>> recognize
>>>>>> input near '"first_name"' 'TEXT' ',' in column name or primary key or
>>>>>> foreign key:33:6,
>>>>>> org.apache.hadoop.hive.ql.parse.ParseDriver:parse:ParseDriver.java:221,
>>>>>> org.apache.hadoop.hive.ql.parse.ParseUtils:parse:ParseUtils.java:75,
>>>>>> org.apache.hadoop.hive.ql.parse.ParseUtils:parse:ParseUtils.java:68,
>>>>>> org.apache.hadoop.hive.ql.Driver:compile:Driver.java:564,
>>>>>> org.apache.hadoop.hive.ql.Driver:compileInternal:Driver.java:1425,
>>>>>> org.apache.hadoop.hive.ql.Driver:compileAndRespond:Driver.java:1398,
>>>>>> org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:205],
>>>>>> sqlState:42000, errorCode:40000, errorMessage:Error while compiling
>>>>>> statement: FAILED: ParseException line 1:39 cannot recognize input near
>>>>>> '"first_name"' 'TEXT' ',' in column name or primary key or foreign key),
>>>>>> Query: CREATE TABLE profile_test.person_test ("first_name" TEXT ,
>>>>>> "last_name" TEXT , "country" TEXT ).
>>>>>>   ... 77 more
>>>>>>
>>>>>>
>>>>>> Found similar issue ion Jira :
>>>>>>
>>>>>> https://issues.apache.org/jira/browse/SPARK-31614
>>>>>>
>>>>>> There no comments in that and the resolution is Incomplete, is there
>>>>>> any way we can do in spark to write data into the hive as JDBC mode.
>>>>>>
>>>>>> Thanks for any help.
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Badrinath.
>>>>>>
>>>>>
>>>>
>>>
>>> --
>>>
>>> --
>>> Daniel Mantovani
>>>
>>>
>
> --
>
> --
> Daniel Mantovani
>
>

-- 

--
Daniel Mantovani

Re: Unable to write data into hive table using Spark via Hive JDBC driver Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED

Reply via email to