Hi,

Yesterday, I have analysed data with 160000 rows and 10 columns. 
Aggregation would be impossible with a data frame format, but when converting 
it to a matrix with *numeric* entries (check, if the variables are of class 
numeric!) the computation needs only 7 seconds on a Pentium III. I´m sadly to 
say, that this is also slow in comparsion with the proc summary in SAS (less 
than one second), but the code is much more elegant in R!

Best,
Matthias


> Hi,
> 
> I use the code below to aggregate / cnt my test data. It 
> works fine, but the problem is with my real data (33'000 
> rows) where the function is really slow (nothing happened in 
> half an hour).
> 
> Does anybody know of other functions that I could use?
> 
> Thanks,
> Hans-Peter
> 
> --------------
> dat <- data.frame( Datum  = c( 32586, 32587, 32587, 32625, 
> 32656, 32656, 32656, 32672, 32672, 32699 ),
>               FischerID = c( 58395, 58395, 58395, 88434, 
> 89953, 89953, 89953, 64395, 62896, 62870 ),
>               Anzahl = c( 2, 2, 1, 1, 2, 1, 7, 1, 1, 2 ) )
> f <- function(x) data.frame( Datum = x[1,1], FischerID = 
> x[1,2], Anzahl = sum( x[,3] ), Cnt = dim( x )[1] )
> t.a <- do.call("rbind", by(dat, dat[,1:2], f))   # slow for 
> 33'000 rows
> t.a <- t.a[order( t.a[,1], t.a[,2] ),]
> 
>   # show data
> dat
> t.a
> 
> ______________________________________________
> [email protected] mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read 
> the posting guide! http://www.R-project.org/posting-guide.html
>

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to