And you can use jsonRDD(json:RDD[String], schema:StructType) to clearly clarify your schema. For numbers later than Long, we can use DecimalType.
Thanks, Daoyuan From: Wang, Daoyuan [mailto:daoyuan.w...@intel.com] Sent: Friday, January 16, 2015 5:14 PM To: Tobias Pfeiffer Cc: user Subject: RE: MatchError in JsonRDD.toLong The second parameter of jsonRDD is the sampling ratio when we infer schema. Thanks, Daoyuan From: Tobias Pfeiffer [mailto:t...@preferred.jp] Sent: Friday, January 16, 2015 5:11 PM To: Wang, Daoyuan Cc: user Subject: Re: MatchError in JsonRDD.toLong Hi, On Fri, Jan 16, 2015 at 5:55 PM, Wang, Daoyuan <daoyuan.w...@intel.com<mailto:daoyuan.w...@intel.com>> wrote: Can you provide how you create the JsonRDD? This should be reproducible in the Spark shell: --------------------------------------------------------- import org.apache.spark.sql._ val sqlc = new SparkContext(sc) val rdd = sc.parallelize("""{"Click":"nonclicked", "Impression":1, "DisplayURL":4401798909506983219, "AdId":21215341}""" :: """{"Click":"nonclicked", "Impression":1, "DisplayURL":14452800566866169008, "AdId":10587781}""" :: Nil) // works fine val json = sqlc.jsonRDD(rdd) json.registerTempTable("test") sqlc.sql("SELECT * FROM test").collect // -> MatchError val json2 = sqlc.jsonRDD(rdd, 0.1) json2.registerTempTable("test2") sqlc.sql("SELECT * FROM test2").collect --------------------------------------------------------- I guess the issue in the latter case is that the column is inferred as Long when some rows actually are too big for Long... Thanks Tobias