Hello Brian, Right now, MapType is not supported in the StructType provided to jsonRDD/jsonFile. We will add the support. I have created https://issues.apache.org/jira/browse/SPARK-4302 to track this issue.
Thanks, Yin On Fri, Nov 7, 2014 at 3:41 PM, boclair <bocl...@gmail.com> wrote: > I'm loading json into spark to create a schemaRDD (sqlContext.jsonRDD(..)). > I'd like some of the json fields to be in a MapType rather than a sub > StructType, as the keys will be very sparse. > > For example: > > val sqlContext = new org.apache.spark.sql.SQLContext(sc) > > import sqlContext.createSchemaRDD > > val jsonRdd = sc.parallelize(Seq("""{"key": "1234", "attributes": > > {"gender": "m"}}""", > """{"key": "4321", > "attributes": {"location": "nyc"}}""")) > > val schemaRdd = sqlContext.jsonRDD(jsonRdd) > > schemaRdd.printSchema > root > |-- attributes: struct (nullable = true) > | |-- gender: string (nullable = true) > | |-- location: string (nullable = true) > |-- key: string (nullable = true) > > schemaRdd.collect > res1: Array[org.apache.spark.sql.Row] = Array([[m,null],1234], > [[null,nyc],4321]) > > > However this isn't what I want. So I created my own StructType to pass to > the jsonRDD call: > > > import org.apache.spark.sql._ > > val st = StructType(Seq(StructField("key", StringType, false), > StructField("attributes", > MapType(StringType, StringType, false)))) > > val jsonRddSt = sc.parallelize(Seq("""{"key": "1234", "attributes": > > {"gender": "m"}}""", > """{"key": "4321", > "attributes": {"location": "nyc"}}""")) > > val schemaRddSt = sqlContext.jsonRDD(jsonRddSt, st) > > schemaRddSt.printSchema > root > |-- key: string (nullable = false) > |-- attributes: map (nullable = true) > | |-- key: string > | |-- value: string (valueContainsNull = false) > > schemaRddSt.collect > *** Failure *** > scala.MatchError: MapType(StringType,StringType,false) (of class > org.apache.spark.sql.catalyst.types.MapType) > at > org.apache.spark.sql.json.JsonRDD$.enforceCorrectType(JsonRDD.scala:397) > ... > > The schema of the schemaRDD is correct. But it seems that the json cannot > be coerced to a MapType. I can see at the line in the stack trace that > there is no case statement for MapType. Is there something I'm missing? > Is > this a bug or decision to not support MapType with json? > > Thanks, > Brian > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/jsonRdd-and-MapType-tp18376.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >