Re: Dataframe fails to save to MySQL table in spark app, but succeeds in spark shell

Takeshi Yamamuro Wed, 25 Jan 2017 21:14:35 -0800

Hi,

I'm not familiar with MyQL though, your MySQL got the same query in the two
patterns?
Have you checked MySQL logs?


// maropu


On Thu, Jan 26, 2017 at 12:42 PM, Xuan Dzung Doan <
doanxuand...@yahoo.com.invalid> wrote:

> Hi,
>
> Spark version 2.1.0
> MySQL community server version 5.7.17
> MySQL Connector Java 5.1.40
>
> I need to save a dataframe to a MySQL table. In spark shell, the following
> statement succeeds:
>
> scala> df.write.mode(SaveMode.Append).format("jdbc").option("url",
> "jdbc:mysql://127.0.0.1:3306/mydb").option("dbtable",
> "person").option("user", "username").option("password", "password").save()
>
> I write an app that basically does the same thing, issuing the same
> statement saving the same dataframe to the same MySQL table. I run it using
> spark-submit, but it fails, reporting some error in the SQL syntax. Here's
> the detailed stack trace:
>
> 17/01/25 16:06:02 INFO DAGScheduler: Job 2 failed: save at
> DataIngestionJob.scala:119, took 0.159574 s
> Exception in thread "main" org.apache.spark.SparkException: Job aborted
> due to stage failure: Task 0 in stage 2.0 failed 1 times, most recent
> failure: Lost task 0.0 in stage 2.0 (TID 3, localhost, executor driver):
> java.sql.BatchUpdateException: You have an error in your SQL syntax; check
> the manual that corresponds to your MySQL server version for the right
> syntax to use near '"user","age","state") VALUES ('user3',44,'CT')' at line
> 1
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(
> NativeConstructorAccessorImpl.java:62)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
> DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>         at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
>         at com.mysql.jdbc.Util.getInstance(Util.java:408)
>         at com.mysql.jdbc.SQLError.createBatchUpdateException(
> SQLError.java:1162)
>         at com.mysql.jdbc.PreparedStatement.executeBatchSerially(
> PreparedStatement.java:1773)
>         at com.mysql.jdbc.PreparedStatement.executeBatchInternal(
> PreparedStatement.java:1257)
>         at com.mysql.jdbc.StatementImpl.executeBatch(StatementImpl.
> java:958)
>         at org.apache.spark.sql.execution.datasources.jdbc.
> JdbcUtils$.savePartition(JdbcUtils.scala:597)
>         at org.apache.spark.sql.execution.datasources.jdbc.
> JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:670)
>         at org.apache.spark.sql.execution.datasources.jdbc.
> JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:670)
>         at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$
> anonfun$apply$29.apply(RDD.scala:925)
>         at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$
> anonfun$apply$29.apply(RDD.scala:925)
>         at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(
> SparkContext.scala:1944)
>         at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(
> SparkContext.scala:1944)
>         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.
> scala:87)
>         at org.apache.spark.scheduler.Task.run(Task.scala:99)
>         at org.apache.spark.executor.Executor$TaskRunner.run(
> Executor.scala:282)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You
> have an error in your SQL syntax; check the manual that corresponds to your
> MySQL server version for the right syntax to use near
> '"user","age","state") VALUES ('user3',44,'CT')' at line 1
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(
> NativeConstructorAccessorImpl.java:62)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
> DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>         at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
>         at com.mysql.jdbc.Util.getInstance(Util.java:408)
>         at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:943)
>         at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3970)
>         at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3906)
>         at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2524)
>         at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2677)
>         at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2549)
>         at com.mysql.jdbc.PreparedStatement.executeInternal(
> PreparedStatement.java:1861)
>         at com.mysql.jdbc.PreparedStatement.executeUpdateInternal(
> PreparedStatement.java:2073)
>         at com.mysql.jdbc.PreparedStatement.executeBatchSerially(
> PreparedStatement.java:1751)
>         ... 15 more
>
> Driver stacktrace:
>         at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$
> scheduler$DAGScheduler$$failJobAndIndependentStages(
> DAGScheduler.scala:1435)
>         at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> abortStage$1.apply(DAGScheduler.scala:1423)
>         at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> abortStage$1.apply(DAGScheduler.scala:1422)
>         at scala.collection.mutable.ResizableArray$class.foreach(
> ResizableArray.scala:59)
>         at scala.collection.mutable.ArrayBuffer.foreach(
> ArrayBuffer.scala:48)
>         at org.apache.spark.scheduler.DAGScheduler.abortStage(
> DAGScheduler.scala:1422)
>         at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
>         at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
>         at scala.Option.foreach(Option.scala:257)
>         at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(
> DAGScheduler.scala:802)
>         at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
> doOnReceive(DAGScheduler.scala:1650)
>         at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
> onReceive(DAGScheduler.scala:1605)
>         at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
> onReceive(DAGScheduler.scala:1594)
>         at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>         at org.apache.spark.scheduler.DAGScheduler.runJob(
> DAGScheduler.scala:628)
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1918)
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1931)
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1944)
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1958)
>         at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.
> apply(RDD.scala:925)
>         at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.
> apply(RDD.scala:923)
>         at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:151)
>         at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:112)
>         at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
>         at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:923)
>         at org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.
> apply$mcV$sp(Dataset.scala:2305)
>         at org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.
> apply(Dataset.scala:2305)
>         at org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.
> apply(Dataset.scala:2305)
>         at org.apache.spark.sql.execution.SQLExecution$.
> withNewExecutionId(SQLExecution.scala:57)
>         at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.
> scala:2765)
>         at org.apache.spark.sql.Dataset.foreachPartition(Dataset.
> scala:2304)
>         at org.apache.spark.sql.execution.datasources.jdbc.
> JdbcUtils$.saveTable(JdbcUtils.scala:670)
>         at org.apache.spark.sql.execution.datasources.jdbc.
> JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:77)
>         at org.apache.spark.sql.execution.datasources.
> DataSource.write(DataSource.scala:426)
>         at org.apache.spark.sql.DataFrameWriter.save(
> DataFrameWriter.scala:215)
>         at io.optics.analytics.dataingestion.DataIngestion.
> run(DataIngestionJob.scala:119)
>         at io.optics.analytics.dataingestion.DataIngestionJob$.main(
> DataIngestionJob.scala:28)
>         at io.optics.analytics.dataingestion.DataIngestionJob.main(
> DataIngestionJob.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$
> deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
>         at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(
> SparkSubmit.scala:187)
>         at org.apache.spark.deploy.SparkSubmit$.submit(
> SparkSubmit.scala:212)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.
> scala:126)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
> Any idea why it's happening? A possible bug in spark?
>
> Thanks,
> Dzung.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


-- 
---
Takeshi Yamamuro

Re: Dataframe fails to save to MySQL table in spark app, but succeeds in spark shell

Reply via email to