Could anybody help? I googled and read a lot, but didn’t find anything helpful.
or to make the question simple: * How to set row number for each group? * SELECT a, ROW_NUMBER() OVER (PARTITION BY a) AS num FROM table. 2014-08-20 15:52 GMT+08:00 Fengyun RAO <raofeng...@gmail.com>: I have a table with 4 columns: a, b, c, time > > What I need is something like: > > SELECT a, b, GroupFirst(c) > FROM t > GROUP BY a, b > > GroupFirst means "the first" item of column c group, > and by "the first" I mean minimal "time" in that group. > > > In Oracle/Sql Server, we could write: > > WITH summary AS ( > SELECT a, > b, c, > ROW_NUMBER() OVER(PARTITION BY a, b ORDER BY time) AS num > FROM t)SELECT s.*FROM summary sWHERE s.num = 1 > > but in Spark SQL, there is no such thing as ROW_NUMBER() > > I wonder how to make it. > > >