On Tue, Dec 8, 2009 at 10:37 PM, David Winsemius <dwinsem...@comcast.net> wrote: > > On Dec 8, 2009, at 11:28 PM, Peng Yu wrote: > >> I have the following code, which tests the split on a data.frame and >> the split on each column (as vector) separately. The runtimes are of >> 10 time difference. When m and k increase, the difference become even >> bigger. >> >> I'm wondering why the performance on data.frame is so bad. Is it a bug >> in R? Can it be improved? > > You might want to look at the data.table package. The author calinms > significant speed improvements over dta.frames
'data.table' doesn't seem to help. You can try the other set of m,n,k. In both case, using as.data.frame is faster than using as.data.table. Please let me know if I understand what you meant. > m=10 > n=6 > k=3 > > #m=300000 > #n=6 > #k=30000 > > set.seed(0) > x=replicate(n,rnorm(m)) > f=sample(1:k, size=m, replace=T) > > library(data.table) Loading required package: ref dim(refdata) and dimnames(refdata) no longer allow parameter ref=TRUE, use dim(derefdata(refdata)), dimnames(derefdata(refdata)) instead > system.time(split(as.data.frame(x),f)) user system elapsed 0.000 0.000 0.003 > system.time(split(as.data.table(x),f)) user system elapsed 0.010 0.000 0.011 >>> system.time(split(as.data.frame(x),f)) >> >> user system elapsed >> 1.700 0.010 1.786 >>> >>> system.time(lapply( >> >> + 1:dim(x)[[2]] >> + , function(i) { >> + split(x[,i],f) >> + } >> + ) >> + ) >> user system elapsed >> 0.170 0.000 0.167 >> >> ########### >> m=30000 >> n=6 >> k=3000 >> >> set.seed(0) >> x=replicate(n,rnorm(m)) >> f=sample(1:k, size=m, replace=T) >> >> system.time(split(as.data.frame(x),f)) >> >> system.time(lapply( >> 1:dim(x)[[2]] >> , function(i) { >> split(x[,i],f) >> } >> ) >> ) >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > Heritage Laboratories > West Hartford, CT > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.