[R] efficiently picking one row from a data frame per unique key

James Kebinger Mon, 12 Apr 2010 18:35:05 -0700

Hello all, I'm trying to transform data frames by grouping the rows by the
values in a particular column, ordered by another column, then picking the
first row in each group.


I'd like to convert a data frame like this:

x  y  z
1 10 20
1 11 19
2 12 18
4 13 17

into one with three rows, like this, where i've discarded one row:

 x  y  z
1 1 11 19
2 2 12 18
4 4 13 17

I've got a solution using aggregate, but it gets very slow with any volume
of data - the performance seems mostly IO bound and never finishes with  a
data set ~6MB

Here's how I'm currently trying to do this

 d = data.frame(x=c(1,1,2,4),y=c(10,11,12,13),z=c(20,19,18,17))
d.ordered = d[order(-d$y),]
aggregate(d.ordered,by=list(key=d.ordered$x),FUN=function(x){x[1]})

I've tried to use split and unsplit, but unsplit complained about duplicate
row names when reassembling the sub frames.

thanks for your suggestions

-james

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] efficiently picking one row from a data frame per unique key

Reply via email to