Re: [Spark SQL] How to select first row in each GROUP BY group?

Fengyun RAO Thu, 21 Aug 2014 05:28:23 -0700

Could anybody help? I googled and read a lot, but didn’t find anything
helpful.


or to make the question simple:

* How to set row number for each group? *

SELECT a,
       ROW_NUMBER() OVER (PARTITION BY a) AS num FROM table.

2014-08-20 15:52 GMT+08:00 Fengyun RAO <raofeng...@gmail.com>:

I have a table with 4 columns: a, b, c, time
>
> What I need is something like:
>
> SELECT a, b, GroupFirst(c)
> FROM t
> GROUP BY a, b
>
> GroupFirst means "the first" item of column c group,
> and by "the first" I mean minimal "time" in that group.
>
>
> In Oracle/Sql Server, we could write:
>
> WITH summary AS (
>     SELECT a,
>            b,            c,
>            ROW_NUMBER() OVER(PARTITION BY a, b ORDER BY time) AS num
>     FROM t)SELECT s.*FROM summary sWHERE s.num = 1
>
> but in Spark SQL, there is no such thing as ROW_NUMBER()
>
> I wonder how to make it.
>
>
>

Re: [Spark SQL] How to select first row in each GROUP BY group?

Reply via email to