On Tue, Dec 8, 2009 at 10:37 PM, David Winsemius <dwinsem...@comcast.net> wrote:
>
> On Dec 8, 2009, at 11:28 PM, Peng Yu wrote:
>
>> I have the following code, which tests the split on a data.frame and
>> the split on each column (as vector) separately. The runtimes are of
>> 10 time difference. When m and k increase, the difference become even
>> bigger.
>>
>> I'm wondering why the performance on data.frame is so bad. Is it a bug
>> in R? Can it be improved?
>
> You might want to look at the data.table package. The author calinms
> significant speed improvements over dta.frames

'data.table' doesn't seem to help. You can try the other set of m,n,k.
In both case, using as.data.frame is faster than using as.data.table.

Please let me know if I understand what you meant.

> m=10
> n=6
> k=3
>
> #m=300000
> #n=6
> #k=30000
>
> set.seed(0)
> x=replicate(n,rnorm(m))
> f=sample(1:k, size=m, replace=T)
>
> library(data.table)
Loading required package: ref
dim(refdata) and dimnames(refdata) no longer allow parameter ref=TRUE,
use dim(derefdata(refdata)), dimnames(derefdata(refdata)) instead
> system.time(split(as.data.frame(x),f))
   user  system elapsed
  0.000   0.000   0.003
> system.time(split(as.data.table(x),f))
   user  system elapsed
  0.010   0.000   0.011

>>> system.time(split(as.data.frame(x),f))
>>
>>  user  system elapsed
>>  1.700   0.010   1.786
>>>
>>> system.time(lapply(
>>
>> +         1:dim(x)[[2]]
>> +         , function(i) {
>> +           split(x[,i],f)
>> +         }
>> +         )
>> +     )
>>  user  system elapsed
>>  0.170   0.000   0.167
>>
>> ###########
>> m=30000
>> n=6
>> k=3000
>>
>> set.seed(0)
>> x=replicate(n,rnorm(m))
>> f=sample(1:k, size=m, replace=T)
>>
>> system.time(split(as.data.frame(x),f))
>>
>> system.time(lapply(
>>       1:dim(x)[[2]]
>>       , function(i) {
>>         split(x[,i],f)
>>       }
>>       )
>>   )
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to