thanks much appreciated On 26 February 2016 at 09:54, Michał Zieliński <zielinski.mich...@gmail.com> wrote:
> Spark has a great documentation > <https://spark.apache.org/docs/latest/api/scala/#org.apache.spark.sql.package> > and > guides <https://spark.apache.org/docs/latest/programming-guide.html>: > > lit and col are here > <https://spark.apache.org/docs/latest/api/scala/#org.apache.spark.sql.package> > getInt is here > <https://spark.apache.org/docs/latest/api/scala/#org.apache.spark.sql.Row> > apply(0) is just a method on Array which is returned by collect (here > <https://spark.apache.org/docs/latest/api/scala/#org.apache.spark.sql.DataFrame> > ) > > On 26 February 2016 at 10:47, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > >> Thanks Michael. Great >> >> d.filter(col("id") === lit(m)).show >> >> BTW where all these methods like lit etc are documented. Also I guess any >> action call like apply(0) or getInt(0) refers to the "current" parameter? >> >> Regards >> >> On 26 February 2016 at 09:42, Michał Zieliński < >> zielinski.mich...@gmail.com> wrote: >> >>> You need to collect the value. >>> >>> val m: Int = d.agg(max($"id")).collect.apply(0).getInt(0) >>> d.filter(col("id") === lit(m)) >>> >>> On 26 February 2016 at 09:41, Mich Talebzadeh <mich.talebza...@gmail.com >>> > wrote: >>> >>>> Can this be done using DFs? >>>> >>>> >>>> >>>> scala> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) >>>> >>>> scala> val d = HiveContext.table("test.dummy") >>>> d: org.apache.spark.sql.DataFrame = [id: int, clustered: int, >>>> scattered: int, randomised: int, random_string: string, small_vc: string, >>>> padding: string] >>>> >>>> scala> var m = d.agg(max($"id")) >>>> m: org.apache.spark.sql.DataFrame = [max(id): int] >>>> >>>> How can I join these two? In other words I want to get all rows with id >>>> = m here? >>>> >>>> d.filter($"id" = m) ? >>>> >>>> Thanks >>>> >>>> On 25/02/2016 22:58, Mohammad Tariq wrote: >>>> >>>> AFAIK, this isn't supported yet. A ticket >>>> <https://issues.apache.org/jira/browse/SPARK-4226> is in progress >>>> though. >>>> >>>> >>>> >>>> [image: http://] <http://about.me/mti> >>>> >>>> Tariq, Mohammad >>>> about.me/mti >>>> [image: http://] >>>> >>>> >>>> >>>> On Fri, Feb 26, 2016 at 4:16 AM, Mich Talebzadeh < >>>> mich.talebza...@cloudtechnologypartners.co.uk> wrote: >>>> >>>>> >>>>> >>>>> Hi, >>>>> >>>>> >>>>> >>>>> I guess the following confirms that Spark does bot support sub-queries >>>>> >>>>> >>>>> >>>>> val d = HiveContext.table("test.dummy") >>>>> >>>>> d.registerTempTable("tmp") >>>>> >>>>> HiveContext.sql("select * from tmp where id IN (select max(id) from >>>>> tmp)") >>>>> >>>>> It crashes >>>>> >>>>> The SQL works OK in Hive itself on the underlying table! >>>>> >>>>> select * from dummy where id IN (select max(id) from dummy); >>>>> >>>>> >>>>> >>>>> Thanks >>>>> >>>> >>> >> >