Re: spark dataframe jdbc Amazon RDS problem

刘虓 Sat, 26 Aug 2017 10:02:40 -0700

my code is here:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
mysql_jdbc_url = 'mydb/test'
table = "test"
props = {"user": "myname", "password": 'mypassword'}


df = spark.read.jdbc(mysql_jdbc_url,table,properties=props)
df.printSchema()
wtf = df.collect()
for i in wtf:print i

2017-08-27 1:00 GMT+08:00 刘虓 <ipf...@gmail.com>:

> hi,all
> I came across this problem yesterday:
> I was using data frame to read from a amazon rds mysql table ,and this
> exception came up:
>
> java.sql.SQLException: Invalid value for getLong() - 'id'
>
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:964)
>
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:897)
>
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:886)
>
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:860)
>
> at com.mysql.jdbc.ResultSetImpl.getLong(ResultSetImpl.java:2688)
>
> at com.mysql.jdbc.ResultSetImpl.getLong(ResultSetImpl.java:2650)
>
> at org.apache.spark.sql.execution.datasources.jdbc.
> JDBCRDD$$anon$1.getNext(JDBCRDD.scala:447)
>
> at org.apache.spark.sql.execution.datasources.jdbc.
> JDBCRDD$$anon$1.hasNext(JDBCRDD.scala:544)
>
> at org.apache.spark.sql.catalyst.expressions.GeneratedClass$
> GeneratedIterator.processNext(Unknown Source)
>
> at org.apache.spark.sql.execution.BufferedRowIterator.
> hasNext(BufferedRowIterator.java:43)
>
> at org.apache.spark.sql.execution.WholeStageCodegenExec$$
> anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
>
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>
> at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.
> hasNext(SerDeUtil.scala:117)
>
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>
> at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.
> foreach(SerDeUtil.scala:112)
>
> at scala.collection.generic.Growable$class.$plus$plus$eq(
> Growable.scala:59)
>
> at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(
> ArrayBuffer.scala:104)
>
> at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(
> ArrayBuffer.scala:48)
>
> at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
>
> at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.
> to(SerDeUtil.scala:112)
>
> at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.
> scala:302)
>
> at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.
> toBuffer(SerDeUtil.scala:112)
>
> at scala.collection.TraversableOnce$class.toArray(
> TraversableOnce.scala:289)
>
> at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.
> toArray(SerDeUtil.scala:112)
>
> at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.
> apply(RDD.scala:912)
>
> at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.
> apply(RDD.scala:912)
>
> at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(
> SparkContext.scala:1916)
>
> at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(
> SparkContext.scala:1916)
>
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>
> at org.apache.spark.scheduler.Task.run(Task.scala:86)
>
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
>
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
>
> at java.lang.Thread.run(Thread.java:722)
>
>
> obviously there seems to be a column name 'a' in the results.
>
> Have anybody seen this before?
>

Re: spark dataframe jdbc Amazon RDS problem

Reply via email to