Hi All: I am using Spark SQL 1.0.1 for a simple test, the loaded data (JSON format) which is registered as table "people" is:
{"name":"Michael", "schools":[{"name":"ABC","time":1994},{"name":"EFG","time":2000}]} {"name":"Andy", "age":30,"scores":{"eng":98,"phy":89}} {"name":"Justin", "age":19} the schools has repeated value {"name":"XXX","time":X}, how should I write the SQL to select the people who has schools with name "ABC"? I have tried "SELECT name FROM people WHERE schools.name = 'ABC' ",but seems wrong with: [error] (run-main-0) org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved attributes: 'name, tree: [error] Project ['name] [error] Filter ('schools.name = ABC) [error] Subquery people [error] ParquetRelation people.parquet, Some(Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml) org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved attributes: 'name, tree: Project ['name] Filter ('schools.name = ABC) Subquery people ParquetRelation people.parquet, Some(Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:71) ... Could anybody show me how to write a right SQL for the repeated data item search in Spark SQL? Thank you! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Repeated-data-item-search-with-Spark-SQL-1-0-1-tp9544.html Sent from the Apache Spark User List mailing list archive at Nabble.com.