Spark SQL support for sub-queries

Mich Talebzadeh Fri, 26 Feb 2016 00:42:08 -0800

Can this be done using DFs?



scala> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)

scala> val d = HiveContext.table("test.dummy")
d: org.apache.spark.sql.DataFrame = [id: int, clustered: int, scattered:
int, randomised: int, random_string: string, small_vc: string, padding:
string]

scala>  var m = d.agg(max($"id"))
m: org.apache.spark.sql.DataFrame = [max(id): int]

How can I join these two? In other words I want to get all rows with id = m
here?

d.filter($"id" = m)  ?

Thanks

On 25/02/2016 22:58, Mohammad Tariq wrote:

AFAIK, this isn't supported yet. A ticket
<https://issues.apache.org/jira/browse/SPARK-4226> is in progress though.



[image: http://] <http://about.me/mti>

Tariq, Mohammad
about.me/mti
[image: http://]



On Fri, Feb 26, 2016 at 4:16 AM, Mich Talebzadeh <
mich.talebza...@cloudtechnologypartners.co.uk> wrote:

>
>
> Hi,
>
>
>
> I guess the following confirms that Spark does bot support sub-queries
>
>
>
> val d = HiveContext.table("test.dummy")
>
> d.registerTempTable("tmp")
>
> HiveContext.sql("select * from tmp where id IN (select max(id) from tmp)")
>
> It crashes
>
> The SQL works OK in Hive itself on the underlying table!
>
> select * from dummy where id IN (select max(id) from dummy);
>
>
>
> Thanks
>

Spark SQL support for sub-queries

Reply via email to