scala> locations.queryExecution warning: there were 1 feature warning(s); re-run with -feature for details
res28: _4.sqlContext.QueryExecution forSome { val _4: org.apache.spark.sql.SchemaRDD } = == Parsed Logical Plan == SparkLogicalPlan (ExistingRdd [locationName#80,locationNumber#81], MappedRDD[99] at map at JsonRDD.scala:38) == Analyzed Logical Plan == SparkLogicalPlan (ExistingRdd [locationName#80,locationNumber#81], MappedRDD[99] at map at JsonRDD.scala:38) == Optimized Logical Plan == SparkLogicalPlan (ExistingRdd [locationName#80,locationNumber#81], MappedRDD[99] at map at JsonRDD.scala:38) == Physical Plan == ExistingRdd [locationName#80,locationNumber#81], MappedRDD[99] at map at JsonRDD.scala:38 Code Generation: false == RDD == scala> people.queryExecution warning: there were 1 feature warning(s); re-run with -feature for details res29: _5.sqlContext.QueryExecution forSome { val _5: org.apache.spark.sql.SchemaRDD } = == Parsed Logical Plan == SparkLogicalPlan (ExistingRdd [age#9,locations#10,name#11], MappedRDD[28] at map at JsonRDD.scala:38) == Analyzed Logical Plan == SparkLogicalPlan (ExistingRdd [age#9,locations#10,name#11], MappedRDD[28] at map at JsonRDD.scala:38) == Optimized Logical Plan == SparkLogicalPlan (ExistingRdd [age#9,locations#10,name#11], MappedRDD[28] at map at JsonRDD.scala:38) == Physical Plan == ExistingRdd [age#9,locations#10,name#11], MappedRDD[28] at map at JsonRDD.scala:38 Code Generation: false == RDD == Here's when I try executing the join and the lateral view explode() : 14/10/28 23:05:35 INFO ParseDriver: Parse Completed org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved attributes: 'p.name,'p.age, tree: Project ['p.name,'p.age] Filter ('location.number = 2300) Join Inner, Some((location#110.number AS number#111 = 'ln.streetnumber)) Generate explode(locations#10), true, false, Some(l) LowerCaseSchema Subquery p Subquery people SparkLogicalPlan (ExistingRdd [age#9,locations#10,name#11], MappedRDD[28] at map at JsonRDD.scala:38) LowerCaseSchema Subquery ln Subquery locationNames SparkLogicalPlan (ExistingRdd [locationName#80,locationNumber#81], MappedRDD[99] at map at JsonRDD.scala:38) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:72) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:70) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165) at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:70) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:68) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59) at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51) at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60) at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51) at org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:397) at org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:397) at org.apache.spark.sql.hive.HiveContext$QueryExecution.optimizedPlan$lzycompute(HiveContext.scala:358) at org.apache.spark.sql.hive.HiveContext$QueryExecution.optimizedPlan(HiveContext.scala:357) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:402) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:400) On Tue, Oct 28, 2014 at 10:48 PM, Michael Armbrust <mich...@databricks.com> wrote: > Can you println the .queryExecution of the SchemaRDD? > > On Tue, Oct 28, 2014 at 7:43 PM, Corey Nolet <cjno...@gmail.com> wrote: > >> So this appears to work just fine: >> >> hctx.sql("SELECT p.name, p.age FROM people p LATERAL VIEW >> explode(locations) l AS location JOIN location5 lo ON l.number = >> lo.streetNumber WHERE location.number = '2300'").collect() >> >> But as soon as I try to join with another set based on a property from >> the exploded locations set, I get invalid attribute exceptions: >> >> hctx.sql("SELECT p.name, p.age, ln.locationName FROM people as p >> LATERAL VIEW explode(locations) l AS location JOIN locationNames ln ON >> location.number = ln.streetNumber WHERE location.number = '2300'").collect() >> >> >> On Tue, Oct 28, 2014 at 10:19 PM, Michael Armbrust < >> mich...@databricks.com> wrote: >> >>> >>> >>> On Tue, Oct 28, 2014 at 6:56 PM, Corey Nolet <cjno...@gmail.com> wrote: >>> >>>> Am I able to do a join on an exploded field? >>>> >>>> Like if I have another object: >>>> >>>> { "streetNumber":"2300", "locationName":"The Big Building"} and I want >>>> to join with the previous json by the locations[].number field- is that >>>> possible? >>>> >>> >>> I'm not sure I fully understand the question, but once its exploded its >>> a normal tuple and you can do any operations on it. The explode is just >>> producing a new row for each element in the array. >>> >>> Awesome, this is what I was looking for. So it's possible to use hive >>>>> dialect in a regular sql context? This is what was confusing to me- the >>>>> docs kind of allude to it but don't directly point it out. >>>>> >>>> >>> No, you need a HiveContext as we use the actual hive parser (SQLContext >>> only exists as a separate entity so that people who don't want Hive's >>> dependencies in their app can still use a limited subset of Spark SQL). >>> >> >> >