Re: Repeated data item search with Spark SQL(1.0.1)

Michael Armbrust Mon, 14 Jul 2014 23:58:18 -0700

Sorry for the trouble.  There are two issues here:
 - Parsing of repeated nested (i.e. something[0].field) is not supported in
the plain SQL parser. SPARK-2096
<https://issues.apache.org/jira/browse/SPARK-2096>
 - Resolution is broken in the HiveQL parser. SPARK-2483
<https://issues.apache.org/jira/browse/SPARK-2483>


The latter issue is fixed now: #1411
<https://github.com/apache/spark/pull/1411>

Michael


On Mon, Jul 14, 2014 at 11:38 PM, anyweil <wei...@gmail.com> wrote:

> Thank you so much for the reply, here is my code.
>
> 1.   val conf = new SparkConf().setAppName("Simple Application")
> 2.   conf.setMaster("local")
> 3.   val sc = new SparkContext(conf)
> 4.   val sqlContext = new org.apache.spark.sql.SQLContext(sc)
> 5.   import sqlContext.createSchemaRDD
> 6.   val path1 = "./data/people.json"
> 7.   val people = sqlContext.jsonFile(path1)
> 8.   people.registerAsTable("people")
> 9.   var sql="SELECT name FROM people WHERE schools.time>2"
> 10. val result = sqlContext.sql(sql)
> 11. result.collect().foreach(println)
>
> the content of people.json is:
> {"name":"Michael",
> "schools":[{"name":"ABC","time":1994},{"name":"EFG","time":2000}]}
> {"name":"Andy", "age":30,"scores":{"eng":98,"phy":89}}
> {"name":"Justin", "age":19}
>
> What I have tried is:
> *1. use HiveSQL:*
> I have tried to replace:
> line 4 with
> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
> line 10 with
> val result = sqlContext.hql(sql)
> (i have recomplie the spark jar with hive support), but seems got the same
> error.
>
> *2. use []. for the access:*
> I have tried to replace:
> line 9 with:
> var sql="SELECT name FROM people WHERE schools[0].time>2", but got the
> error:
>
> 14/07/15 14:37:49 INFO SparkContext: Job finished: reduce at
> JsonRDD.scala:40, took 0.98412 s
> Exception in thread "main" java.lang.RuntimeException: [1.41] failure:
> ``UNION'' expected but identifier .time found
>
> SELECT name FROM people WHERE schools[0].time>2
>                                         ^
>         at scala.sys.package$.error(package.scala:27)
>         at
> org.apache.spark.sql.catalyst.SqlParser.apply(SqlParser.scala:60)
>         at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:69)
>         at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:185)
>         at SimpleApp$.main(SimpleApp.scala:32)
>         at SimpleApp.main(SimpleApp.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at
> com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
>
> seems not supported.
>
>
>
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Query-the-nested-JSON-data-With-Spark-SQL-1-0-1-tp9544p9731.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: Repeated data item search with Spark SQL(1.0.1)

Reply via email to