I have been using Spark SQL to read in JSON data, like so: val myJsonFile = sqc.jsonFile(args("myLocation")) myJsonFile.registerTempTable("myTable") sqc.sql("mySQLQuery").map { row => myFunction(row) }
And then in myFunction(row) I can read the various columns with the Row.getX methods. However, this methods only work for basic types (string, int, ...). I was having some trouble reading columns that are arrays or maps (i.e. other JSON objects). I am now using Spark 1.2 from the Cloudera snapshot and I noticed that there is a new method getAs. I was able to use it to read for example an array of strings like so: t.getAs[Buffer[CharSequence]](12) However, if I try to read a column with a nested JSON object like this: t.getAs[Map[String, Any]](11) I get the following error: java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericRow cannot be cast to scala.collection.immutable.Map How can I read such a field? Am I just missing something small or should I be looking for a completely different alternative to reading JSON? Simone Franzini, PhD http://www.linkedin.com/in/simonefranzini