On Dec 9, 2009, at 2:59 PM, Peng Yu wrote:

On Tue, Dec 8, 2009 at 10:37 PM, David Winsemius <dwinsem...@comcast.net > wrote:

On Dec 8, 2009, at 11:28 PM, Peng Yu wrote:

I have the following code, which tests the split on a data.frame and
the split on each column (as vector) separately. The runtimes are of
10 time difference. When m and k increase, the difference become even
bigger.

I'm wondering why the performance on data.frame is so bad. Is it a bug
in R? Can it be improved?

You might want to look at the data.table package. The author calinms
significant speed improvements over dta.frames

'data.table' doesn't seem to help. You can try the other set of m,n,k.
In both case, using as.data.frame is faster than using as.data.table.

Please let me know if I understand what you meant.

I was only suggesting that you look at it because it appeared in other situation to have efficiency advantages. As it turned out, that structure offered no advantage, when I tested it.

--
David.



m=10
n=6
k=3

#m=300000
#n=6
#k=30000

set.seed(0)
x=replicate(n,rnorm(m))
f=sample(1:k, size=m, replace=T)

library(data.table)
Loading required package: ref
dim(refdata) and dimnames(refdata) no longer allow parameter ref=TRUE,
use dim(derefdata(refdata)), dimnames(derefdata(refdata)) instead
system.time(split(as.data.frame(x),f))
  user  system elapsed
 0.000   0.000   0.003
system.time(split(as.data.table(x),f))
  user  system elapsed
 0.010   0.000   0.011

system.time(split(as.data.frame(x),f))

 user  system elapsed
 1.700   0.010   1.786

system.time(lapply(

+         1:dim(x)[[2]]
+         , function(i) {
+           split(x[,i],f)
+         }
+         )
+     )
 user  system elapsed
 0.170   0.000   0.167

###########
m=30000
n=6
k=3000

set.seed(0)
x=replicate(n,rnorm(m))
f=sample(1:k, size=m, replace=T)

system.time(split(as.data.frame(x),f))

system.time(lapply(
      1:dim(x)[[2]]
      , function(i) {
        split(x[,i],f)
      }
      )
  )

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT



______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to