Sorry for the trouble. There are two issues here: - Parsing of repeated nested (i.e. something[0].field) is not supported in the plain SQL parser. SPARK-2096 <https://issues.apache.org/jira/browse/SPARK-2096> - Resolution is broken in the HiveQL parser. SPARK-2483 <https://issues.apache.org/jira/browse/SPARK-2483>
The latter issue is fixed now: #1411 <https://github.com/apache/spark/pull/1411> Michael On Mon, Jul 14, 2014 at 11:38 PM, anyweil <wei...@gmail.com> wrote: > Thank you so much for the reply, here is my code. > > 1. val conf = new SparkConf().setAppName("Simple Application") > 2. conf.setMaster("local") > 3. val sc = new SparkContext(conf) > 4. val sqlContext = new org.apache.spark.sql.SQLContext(sc) > 5. import sqlContext.createSchemaRDD > 6. val path1 = "./data/people.json" > 7. val people = sqlContext.jsonFile(path1) > 8. people.registerAsTable("people") > 9. var sql="SELECT name FROM people WHERE schools.time>2" > 10. val result = sqlContext.sql(sql) > 11. result.collect().foreach(println) > > the content of people.json is: > {"name":"Michael", > "schools":[{"name":"ABC","time":1994},{"name":"EFG","time":2000}]} > {"name":"Andy", "age":30,"scores":{"eng":98,"phy":89}} > {"name":"Justin", "age":19} > > What I have tried is: > *1. use HiveSQL:* > I have tried to replace: > line 4 with > val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) > line 10 with > val result = sqlContext.hql(sql) > (i have recomplie the spark jar with hive support), but seems got the same > error. > > *2. use []. for the access:* > I have tried to replace: > line 9 with: > var sql="SELECT name FROM people WHERE schools[0].time>2", but got the > error: > > 14/07/15 14:37:49 INFO SparkContext: Job finished: reduce at > JsonRDD.scala:40, took 0.98412 s > Exception in thread "main" java.lang.RuntimeException: [1.41] failure: > ``UNION'' expected but identifier .time found > > SELECT name FROM people WHERE schools[0].time>2 > ^ > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.catalyst.SqlParser.apply(SqlParser.scala:60) > at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:69) > at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:185) > at SimpleApp$.main(SimpleApp.scala:32) > at SimpleApp.main(SimpleApp.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) > > seems not supported. > > > > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Query-the-nested-JSON-data-With-Spark-SQL-1-0-1-tp9544p9731.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >