Re: [R] get top n rows group by a column from a dataframe

Dennis Murphy Thu, 16 Sep 2010 12:28:51 -0700

Hi:

You already have some good solutions; here is another using the plyr
package, based on a slight modification in salary from Phil Spector's test
data:

sdata = data.frame(company=sample(LETTERS[1:8],1000,replace=TRUE),
                   person=1:1000,
                   salary= trunc(rnorm(1000, mean = 50000, sd = 10000)))
library(plyr)

# Function to pick out the top salaries
f <- function(df) {
     if(nrow(df) == 0L) return(NULL)
     d <- df[order(df['salary']), ]
     tail(d, min(nrow(d), 5))
    }
topguys <- ddply(sdata, 'company', f)

This gives you both the top five salaries and the people who earned them,
which might be helpful (or not, if confidentiality is a concern, in which
case you put -2 in the column entry of d in the function f.

HTH,
Dennis

On Thu, Sep 16, 2010 at 8:39 AM, Tan, Richard <r...@panagora.com> wrote:

> Hi, is there an R function like sql's TOP key word?
>
>
>
> I have a dataframe that has 3 columns: company, person, salary
>
>
>
> How do I get top 5 highest paid person for each company, and if I have
> fewer than 5 people for a company, just return all of them?
>
>
>
> Thanks,
>
> Richard
>
>
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] get top n rows group by a column from a dataframe

Reply via email to