Thanks Michael. Great

 d.filter(col("id") === lit(m)).show

BTW where all these methods like lit etc are documented. Also I guess any
action call like apply(0) or getInt(0) refers to the "current" parameter?

Regards

On 26 February 2016 at 09:42, Michał Zieliński <zielinski.mich...@gmail.com>
wrote:

> You need to collect the value.
>
> val m: Int = d.agg(max($"id")).collect.apply(0).getInt(0)
> d.filter(col("id") === lit(m))
>
> On 26 February 2016 at 09:41, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> Can this be done using DFs?
>>
>>
>>
>> scala> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
>>
>> scala> val d = HiveContext.table("test.dummy")
>> d: org.apache.spark.sql.DataFrame = [id: int, clustered: int, scattered:
>> int, randomised: int, random_string: string, small_vc: string, padding:
>> string]
>>
>> scala>  var m = d.agg(max($"id"))
>> m: org.apache.spark.sql.DataFrame = [max(id): int]
>>
>> How can I join these two? In other words I want to get all rows with id =
>> m here?
>>
>> d.filter($"id" = m)  ?
>>
>> Thanks
>>
>> On 25/02/2016 22:58, Mohammad Tariq wrote:
>>
>> AFAIK, this isn't supported yet. A ticket
>> <https://issues.apache.org/jira/browse/SPARK-4226> is in progress though.
>>
>>
>>
>> [image: http://] <http://about.me/mti>
>>
>> Tariq, Mohammad
>> about.me/mti
>> [image: http://]
>>
>>
>>
>> On Fri, Feb 26, 2016 at 4:16 AM, Mich Talebzadeh <
>> mich.talebza...@cloudtechnologypartners.co.uk> wrote:
>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> I guess the following confirms that Spark does bot support sub-queries
>>>
>>>
>>>
>>> val d = HiveContext.table("test.dummy")
>>>
>>> d.registerTempTable("tmp")
>>>
>>> HiveContext.sql("select * from tmp where id IN (select max(id) from
>>> tmp)")
>>>
>>> It crashes
>>>
>>> The SQL works OK in Hive itself on the underlying table!
>>>
>>> select * from dummy where id IN (select max(id) from dummy);
>>>
>>>
>>>
>>> Thanks
>>>
>>
>

Reply via email to