Re: [Rd] fast version of split.data.frame or conversion from data.frame to list of its rows

Prof Brian Ripley Tue, 01 May 2012 05:48:09 -0700

On 01/05/2012 00:28, Antonio Piccolboni wrote:

Hi,
I was wondering if there is anything more efficient than split to do the
kind of conversion in the subject. If I create a data frame as in


system.time({fd =  data.frame(x=1:2000, y = rnorm(2000), id = paste("x",
1:2000, sep =""))})
   user  system elapsed
   0.004   0.000   0.004

and then I try to split it

system.time(split(fd, 1:nrow(fd)))

    user  system elapsed
   0.333   0.031   0.415


You will be quick to notice the roughly two orders of magnitude difference
in time between creation and conversion. Granted, it's not written anywhere

Unsurprising when you create three orders of magnitude more data frames,is it? That's a list of 2000 data frames. Try

system.time(for(i in 1:2000) data.frame(x = i, y = rnorm(1), id =paste0("x", i)))

that they should be similar but the latter seems interpreter-slow to me
(split is implemented with a lapply in the data frame case) There is also a
memory issue when I hit about 20000 elements (allocating 3GB when
interrupted). So before I resort to Rcpp, despite the electrifying feeling
of approaching the bare metal and for the sake of getting things done, I
thought I would ask the experts. Thanks

You need to re-think your data structures: 1-row data frames are notsensible.



Antonio

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Brian D. Ripley,                  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] fast version of split.data.frame or conversion from data.frame to list of its rows

Reply via email to