[R] faster version of split()?

2009-01-16 Thread Simon Pickett
Hi all, I want to calculate the number of unique observations of y in each level of x from my data frame df. this does the job but it is very slow for this big data frame (159503 rows, 11 columns). group.list - split(df$y,df$x) count - function(x) length(unique(na.omit(x)))

Re: [R] faster version of split()?

2009-01-16 Thread Henrique Dallazuanna
Maybe: with(df, tapply(y, x, count)) On Fri, Jan 16, 2009 at 8:10 AM, Simon Pickett simon.pick...@bto.orgwrote: Hi all, I want to calculate the number of unique observations of y in each level of x from my data frame df. this does the job but it is very slow for this big data frame

Re: [R] faster version of split()?

2009-01-16 Thread r...@quantide.com
df = data.frame(x = sample(7:9, 100, rep = T), y = sample(1:5, 100, rep = T)) fun = function(x){length(unique(x))} by(df$x, df$y, fun) Simon Pickett wrote: Hi all, I want to calculate the number of unique observations of y in each level of x from my data frame df. this does the job but it

Re: [R] faster version of split()?

2009-01-16 Thread Søren Højsgaard
- Fra: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] På vegne af Simon Pickett Sendt: 16. januar 2009 11:10 Til: R help Emne: [R] faster version of split()? Hi all, I want to calculate the number of unique observations of y in each level of x from my data frame df. this does

Re: [R] faster version of split()?

2009-01-16 Thread David Winsemius
Henrique's solution seems sensible. Another might be: df = data.frame(x = sample(7:9, 10, rep = T), y = sample(1:5, 10, rep = T)) table(df) y x 1 2 3 4 5 7 1 0 1 0 2 8 0 1 0 0 1 9 0 1 1 2 0 rowSums(table(df) 0) 7 8 9 3 2 3 #-same as Henrique's count -

Re: [R] faster version of split()?

2009-01-16 Thread Peter Dalgaard
Simon Pickett wrote: Hi all, I want to calculate the number of unique observations of y in each level of x from my data frame df. this does the job but it is very slow for this big data frame (159503 rows, 11 columns). group.list - split(df$y,df$x) count - function(x)