Re: Reading nested JSON data with Spark SQL
oops sqlContext.setConf("spark.sql.parquet.binaryAsString", "true") thois solved the issue important for everyone -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Reading-nested-JSON-data-with-Spark-SQL-tp19310p20936.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Reading nested JSON data with Spark SQL
Also it looks like that when I store the String in parquet and try to fetch them using spark code I got classcast exception below how my array of strings are saved. each character ascii value is present in array of ints res25: Array[Seq[String]] r= Array(ArrayBuffer(Array(104, 116, 116, 112, 58, 47, 47, 102, 98, 46, 109, 101, 47, 51, 67, 111, 72, 108, 99, 101, 77, 103)), ArrayBuffer(), ArrayBuffer(), ArrayBuffer(), ArrayBuffer(Array(104, 116, 116, 112, 58, 47, 47, 105, 110, 115, 116, 97, 103, 114, 97, 109, 46, 99, 111, 109, 47, 112, 47, 120, 84, 50, 51, 78, 76, 105, 85, 55, 102, 47)), ArrayBuffer(), ArrayBuffer(Array(104, 116, 116, 112, 58, 47, 47, 105, 110, 115, 116, 97, 103, 114, 97, 109, 46, 99, 111, 109, 47, 112, 47, 120, 84, 50, 53, 72, 52, 111, 90, 95, 114, 47)), ArrayBuffer(Array(104, 116, 116, 112, 58, 47, 47, 101, 122, 101, 101, 99, 108, 97, 115, 115, 105, 102, 105, 101, 100, 97, 100, 115, 46, 99, 111, 109, 47, 47, 100, 101, 115, 99, 47, 106, 97, 105, 112, 117, 114, 47, 49, 48, 51, 54, 50, 50, -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Reading-nested-JSON-data-with-Spark-SQL-tp19310p20935.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Reading nested JSON data with Spark SQL
Hih I am having simiiar problem and tries your solution with spark 1.2 build withing hadoop I am saving object to parquet files where some fields are of type Array. When I fetch them as below I get java.lang.ClassCastException: [B cannot be cast to java.lang.CharSequence def fetchTags(rows: SchemaRDD) = { rows.flatMap ( x => ((x.getAs[Buffer[CharSequence]](0)).map(_.toString())) ) } The value I am fetching have been stored as Array of Strings. I have tried replacing Buffer[CharSequence] with Array[String] Seq[String] Seq[Seq[char]] but still got errors Can you provide clue. Pankaj -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Reading-nested-JSON-data-with-Spark-SQL-tp19310p20933.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Reading nested JSON data with Spark SQL
This works great, thank you! Simone Franzini, PhD http://www.linkedin.com/in/simonefranzini On Wed, Nov 19, 2014 at 3:40 PM, Michael Armbrust wrote: > You can extract the nested fields in sql: SELECT field.nestedField ... > > If you don't do that then nested fields are represented as rows within > rows and can be retrieved as follows: > > t.getAs[Row](0).getInt(0) > > Also, I would write t.getAs[Buffer[CharSequence]](12) as > t.getAs[Seq[String]](12) since we don't guarantee the return type will be > a buffer. > > > On Wed, Nov 19, 2014 at 1:33 PM, Simone Franzini > wrote: > >> I have been using Spark SQL to read in JSON data, like so: >> val myJsonFile = sqc.jsonFile(args("myLocation")) >> myJsonFile.registerTempTable("myTable") >> sqc.sql("mySQLQuery").map { row => >> myFunction(row) >> } >> >> And then in myFunction(row) I can read the various columns with the >> Row.getX methods. However, this methods only work for basic types (string, >> int, ...). >> I was having some trouble reading columns that are arrays or maps (i.e. >> other JSON objects). >> >> I am now using Spark 1.2 from the Cloudera snapshot and I noticed that >> there is a new method getAs. I was able to use it to read for example an >> array of strings like so: >> t.getAs[Buffer[CharSequence]](12) >> >> However, if I try to read a column with a nested JSON object like this: >> t.getAs[Map[String, Any]](11) >> >> I get the following error: >> java.lang.ClassCastException: >> org.apache.spark.sql.catalyst.expressions.GenericRow cannot be cast to >> scala.collection.immutable.Map >> >> How can I read such a field? Am I just missing something small or should >> I be looking for a completely different alternative to reading JSON? >> >> Simone Franzini, PhD >> >> http://www.linkedin.com/in/simonefranzini >> > >
Re: Reading nested JSON data with Spark SQL
You can extract the nested fields in sql: SELECT field.nestedField ... If you don't do that then nested fields are represented as rows within rows and can be retrieved as follows: t.getAs[Row](0).getInt(0) Also, I would write t.getAs[Buffer[CharSequence]](12) as t.getAs[Seq[String]](12) since we don't guarantee the return type will be a buffer. On Wed, Nov 19, 2014 at 1:33 PM, Simone Franzini wrote: > I have been using Spark SQL to read in JSON data, like so: > val myJsonFile = sqc.jsonFile(args("myLocation")) > myJsonFile.registerTempTable("myTable") > sqc.sql("mySQLQuery").map { row => > myFunction(row) > } > > And then in myFunction(row) I can read the various columns with the > Row.getX methods. However, this methods only work for basic types (string, > int, ...). > I was having some trouble reading columns that are arrays or maps (i.e. > other JSON objects). > > I am now using Spark 1.2 from the Cloudera snapshot and I noticed that > there is a new method getAs. I was able to use it to read for example an > array of strings like so: > t.getAs[Buffer[CharSequence]](12) > > However, if I try to read a column with a nested JSON object like this: > t.getAs[Map[String, Any]](11) > > I get the following error: > java.lang.ClassCastException: > org.apache.spark.sql.catalyst.expressions.GenericRow cannot be cast to > scala.collection.immutable.Map > > How can I read such a field? Am I just missing something small or should I > be looking for a completely different alternative to reading JSON? > > Simone Franzini, PhD > > http://www.linkedin.com/in/simonefranzini >
Reading nested JSON data with Spark SQL
I have been using Spark SQL to read in JSON data, like so: val myJsonFile = sqc.jsonFile(args("myLocation")) myJsonFile.registerTempTable("myTable") sqc.sql("mySQLQuery").map { row => myFunction(row) } And then in myFunction(row) I can read the various columns with the Row.getX methods. However, this methods only work for basic types (string, int, ...). I was having some trouble reading columns that are arrays or maps (i.e. other JSON objects). I am now using Spark 1.2 from the Cloudera snapshot and I noticed that there is a new method getAs. I was able to use it to read for example an array of strings like so: t.getAs[Buffer[CharSequence]](12) However, if I try to read a column with a nested JSON object like this: t.getAs[Map[String, Any]](11) I get the following error: java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericRow cannot be cast to scala.collection.immutable.Map How can I read such a field? Am I just missing something small or should I be looking for a completely different alternative to reading JSON? Simone Franzini, PhD http://www.linkedin.com/in/simonefranzini