Re: Spark SQL support for sub-queries

Mich Talebzadeh Fri, 26 Feb 2016 01:59:58 -0800

thanks much appreciated

On 26 February 2016 at 09:54, Michał Zieliński <zielinski.mich...@gmail.com>
wrote:


> Spark has a great documentation
> <https://spark.apache.org/docs/latest/api/scala/#org.apache.spark.sql.package>
>  and
> guides <https://spark.apache.org/docs/latest/programming-guide.html>:
>
> lit and col are here
> <https://spark.apache.org/docs/latest/api/scala/#org.apache.spark.sql.package>
> getInt is here
> <https://spark.apache.org/docs/latest/api/scala/#org.apache.spark.sql.Row>
> apply(0) is just a method on Array which is returned by collect (here
> <https://spark.apache.org/docs/latest/api/scala/#org.apache.spark.sql.DataFrame>
> )
>
> On 26 February 2016 at 10:47, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> Thanks Michael. Great
>>
>>  d.filter(col("id") === lit(m)).show
>>
>> BTW where all these methods like lit etc are documented. Also I guess any
>> action call like apply(0) or getInt(0) refers to the "current" parameter?
>>
>> Regards
>>
>> On 26 February 2016 at 09:42, Michał Zieliński <
>> zielinski.mich...@gmail.com> wrote:
>>
>>> You need to collect the value.
>>>
>>> val m: Int = d.agg(max($"id")).collect.apply(0).getInt(0)
>>> d.filter(col("id") === lit(m))
>>>
>>> On 26 February 2016 at 09:41, Mich Talebzadeh <mich.talebza...@gmail.com
>>> > wrote:
>>>
>>>> Can this be done using DFs?
>>>>
>>>>
>>>>
>>>> scala> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
>>>>
>>>> scala> val d = HiveContext.table("test.dummy")
>>>> d: org.apache.spark.sql.DataFrame = [id: int, clustered: int,
>>>> scattered: int, randomised: int, random_string: string, small_vc: string,
>>>> padding: string]
>>>>
>>>> scala>  var m = d.agg(max($"id"))
>>>> m: org.apache.spark.sql.DataFrame = [max(id): int]
>>>>
>>>> How can I join these two? In other words I want to get all rows with id
>>>> = m here?
>>>>
>>>> d.filter($"id" = m)  ?
>>>>
>>>> Thanks
>>>>
>>>> On 25/02/2016 22:58, Mohammad Tariq wrote:
>>>>
>>>> AFAIK, this isn't supported yet. A ticket
>>>> <https://issues.apache.org/jira/browse/SPARK-4226> is in progress
>>>> though.
>>>>
>>>>
>>>>
>>>> [image: http://] <http://about.me/mti>
>>>>
>>>> Tariq, Mohammad
>>>> about.me/mti
>>>> [image: http://]
>>>>
>>>>
>>>>
>>>> On Fri, Feb 26, 2016 at 4:16 AM, Mich Talebzadeh <
>>>> mich.talebza...@cloudtechnologypartners.co.uk> wrote:
>>>>
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>>
>>>>> I guess the following confirms that Spark does bot support sub-queries
>>>>>
>>>>>
>>>>>
>>>>> val d = HiveContext.table("test.dummy")
>>>>>
>>>>> d.registerTempTable("tmp")
>>>>>
>>>>> HiveContext.sql("select * from tmp where id IN (select max(id) from
>>>>> tmp)")
>>>>>
>>>>> It crashes
>>>>>
>>>>> The SQL works OK in Hive itself on the underlying table!
>>>>>
>>>>> select * from dummy where id IN (select max(id) from dummy);
>>>>>
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>
>>>
>>
>

Re: Spark SQL support for sub-queries

Reply via email to