Thank you so much for the reply, here is my code. 1. val conf = new SparkConf().setAppName("Simple Application") 2. conf.setMaster("local") 3. val sc = new SparkContext(conf) 4. val sqlContext = new org.apache.spark.sql.SQLContext(sc) 5. import sqlContext.createSchemaRDD 6. val path1 = "./data/people.json" 7. val people = sqlContext.jsonFile(path1) 8. people.registerAsTable("people") 9. var sql="SELECT name FROM people WHERE schools.time>2" 10. val result = sqlContext.sql(sql) 11. result.collect().foreach(println)
the content of people.json is: {"name":"Michael", "schools":[{"name":"ABC","time":1994},{"name":"EFG","time":2000}]} {"name":"Andy", "age":30,"scores":{"eng":98,"phy":89}} {"name":"Justin", "age":19} What I have tried is: *1. use HiveSQL:* I have tried to replace: line 4 with val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) line 10 with val result = sqlContext.hql(sql) (i have recomplie the spark jar with hive support), but seems got the same error. *2. use []. for the access:* I have tried to replace: line 9 with: var sql="SELECT name FROM people WHERE schools[0].time>2", but got the error: 14/07/15 14:37:49 INFO SparkContext: Job finished: reduce at JsonRDD.scala:40, took 0.98412 s Exception in thread "main" java.lang.RuntimeException: [1.41] failure: ``UNION'' expected but identifier .time found SELECT name FROM people WHERE schools[0].time>2 ^ at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.SqlParser.apply(SqlParser.scala:60) at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:69) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:185) at SimpleApp$.main(SimpleApp.scala:32) at SimpleApp.main(SimpleApp.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) seems not supported. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Query-the-nested-JSON-data-With-Spark-SQL-1-0-1-tp9544p9731.html Sent from the Apache Spark User List mailing list archive at Nabble.com.