r().over(overLocation)).filter("rowNumber<=50")
sortedDF.write.saveAsTable("house_id_pv_location_top50")Thank you
guys.
Thanks&Best regards!
San.Luo
- 原始邮件 -----
发件人:Anton Okolnychyi
收件人:罗辉 , user
主题:Re: Re: how
API I want , because there
> are some values have same pv are ranked as same values. And first 50 rows
> of each frame is what I'm expecting. the attached file shows what I got by
> using rank.
> Thank you anyway, I learnt what rank could provide from your advice.
>
>
ank API, however this is not the API I want , because there
> are some values have same pv are ranked as same values. And first 50 rows
> of each frame is what I'm expecting. the attached file shows what I got by
> using rank.
> Thank you anyway, I learnt what rank could provide from your
ovide from your
advice.
Thanks&Best regards!
San.Luo
----- 原始邮件 -
发件人:Anton Okolnychyi
收件人:user
抄送人:luohui20...@sina.com
主题:Re: how to select first 50 value of each group after group by?
日期:2016年07月06日 23点22分
The following resources sh
The following resources should be useful:
https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-sql-windows.html
The last link should have the exact solution
2016-07-06 16:55 GMT+02:00 Tal Gry
You can use rank window function to rank each row in the group, and then
filter the rowz with rank < 50
On Wed, Jul 6, 2016, 14:07 wrote:
> hi there
> I have a DF with 3 columns: id , pv, location.(the rows are already
> grouped by location and sort by pv in des) I wanna get the first 50 id
>
hi thereI have a DF with 3 columns: id , pv, location.(the rows are already
grouped by location and sort by pv in des) I wanna get the first 50 id values
grouped by location. I checked the API of dataframe,groupeddata,pairRDD, and
found no match. is there a way to do this naturally? a