Can this be done using DFs?
scala> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) scala> val d = HiveContext.table("test.dummy") d: org.apache.spark.sql.DataFrame = [id: int, clustered: int, scattered: int, randomised: int, random_string: string, small_vc: string, padding: string] scala> var m = d.agg(max($"id")) m: org.apache.spark.sql.DataFrame = [max(id): int] How can I join these two? In other words I want to get all rows with id = m here? d.filter($"id" = m) ? Thanks On 25/02/2016 22:58, Mohammad Tariq wrote: AFAIK, this isn't supported yet. A ticket <https://issues.apache.org/jira/browse/SPARK-4226> is in progress though. [image: http://] <http://about.me/mti> Tariq, Mohammad about.me/mti [image: http://] On Fri, Feb 26, 2016 at 4:16 AM, Mich Talebzadeh < mich.talebza...@cloudtechnologypartners.co.uk> wrote: > > > Hi, > > > > I guess the following confirms that Spark does bot support sub-queries > > > > val d = HiveContext.table("test.dummy") > > d.registerTempTable("tmp") > > HiveContext.sql("select * from tmp where id IN (select max(id) from tmp)") > > It crashes > > The SQL works OK in Hive itself on the underlying table! > > select * from dummy where id IN (select max(id) from dummy); > > > > Thanks >