my code is here: from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() mysql_jdbc_url = 'mydb/test' table = "test" props = {"user": "myname", "password": 'mypassword'}
df = spark.read.jdbc(mysql_jdbc_url,table,properties=props) df.printSchema() wtf = df.collect() for i in wtf:print i 2017-08-27 1:00 GMT+08:00 刘虓 <ipf...@gmail.com>: > hi,all > I came across this problem yesterday: > I was using data frame to read from a amazon rds mysql table ,and this > exception came up: > > java.sql.SQLException: Invalid value for getLong() - 'id' > > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:964) > > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:897) > > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:886) > > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:860) > > at com.mysql.jdbc.ResultSetImpl.getLong(ResultSetImpl.java:2688) > > at com.mysql.jdbc.ResultSetImpl.getLong(ResultSetImpl.java:2650) > > at org.apache.spark.sql.execution.datasources.jdbc. > JDBCRDD$$anon$1.getNext(JDBCRDD.scala:447) > > at org.apache.spark.sql.execution.datasources.jdbc. > JDBCRDD$$anon$1.hasNext(JDBCRDD.scala:544) > > at org.apache.spark.sql.catalyst.expressions.GeneratedClass$ > GeneratedIterator.processNext(Unknown Source) > > at org.apache.spark.sql.execution.BufferedRowIterator. > hasNext(BufferedRowIterator.java:43) > > at org.apache.spark.sql.execution.WholeStageCodegenExec$$ > anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370) > > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > > at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler. > hasNext(SerDeUtil.scala:117) > > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > > at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler. > foreach(SerDeUtil.scala:112) > > at scala.collection.generic.Growable$class.$plus$plus$eq( > Growable.scala:59) > > at scala.collection.mutable.ArrayBuffer.$plus$plus$eq( > ArrayBuffer.scala:104) > > at scala.collection.mutable.ArrayBuffer.$plus$plus$eq( > ArrayBuffer.scala:48) > > at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310) > > at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler. > to(SerDeUtil.scala:112) > > at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce. > scala:302) > > at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler. > toBuffer(SerDeUtil.scala:112) > > at scala.collection.TraversableOnce$class.toArray( > TraversableOnce.scala:289) > > at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler. > toArray(SerDeUtil.scala:112) > > at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13. > apply(RDD.scala:912) > > at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13. > apply(RDD.scala:912) > > at org.apache.spark.SparkContext$$anonfun$runJob$5.apply( > SparkContext.scala:1916) > > at org.apache.spark.SparkContext$$anonfun$runJob$5.apply( > SparkContext.scala:1916) > > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > > at org.apache.spark.scheduler.Task.run(Task.scala:86) > > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1145) > > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:722) > > > obviously there seems to be a column name 'a' in the results. > > Have anybody seen this before? >