Hi Paul, On Sun, Aug 18, 2013 at 12:56 AM, Paul Bernal <[email protected]> wrote: > Thanks a lot for the valuable information. > > Now my question would necessarily be, how many columns can R handle, > provided that I have millions of rows and, in general, whats the maximum > amount of rows and columns that R can effortlessly handle?
This is all determined by your RAM. Prior to R-3.0, R could only handle vectors of length 2^31 - 1. If you were working with a matrix, that meant that you could only have that many elements in the entire matrix. If you were working with a data.frame, you could have data.frames with 2^31-1 rows, and I guess as many columns, since data.frames are really a list of vectors, the entire thing doesn't have to be in one contiguous block (and addressable that way) R-3.0 introduced "Long Vectors" (search for that section in the release notes): https://stat.ethz.ch/pipermail/r-announce/2013/000561.html It almost doubles the size of a vector that R can handle (assuming you are running 64bit). So, if you've got the RAM, you can have a data.frame/data.table w/ billion(s) of rows, in theory. To figure out how much data you can handle on your machine, you need to know the size of real/integer/whatever and the number of elements of those you will have so you can calculate the amount of RAM you need to load it all up. Lastly, I should mention there are packages that let you work with "out of memory" data, like bigmemory, biglm, ff. Look at the HPC Task view for more info along those lines: http://cran.r-project.org/web/views/HighPerformanceComputing.html > > Best regards and again thank you for the help, > > Paul > El 18/08/2013 02:35, "Steve Lianoglou" <[email protected]> escribió: > >> Hi Paul, >> >> First: please keep your replies on list (use reply-all when replying >> to R-help lists) so that others can help but also the lists can be >> used as a resource for others. >> >> Now: >> >> On Aug 18, 2013, at 12:20 AM, Paul Bernal <[email protected]> wrote: >> >> > Can R really handle millions of rows of data? >> >> Yup. >> >> > I thought it was not possible. >> >> Surprise :-) >> >> As I type, I'm working with a ~5.5 million row data.table pretty >> effortlessly. >> >> Columns matter too, of course -- RAM is RAM, after all and you've got >> to be able to fit the whole thing into it if you want to use >> data.table. Once loaded, though, data.table enables one to do >> split/apply/combine calculations over these data quite efficiently. >> The first time I used it, I was honestly blown away. >> >> If you find yourself wanting to work with such data, you could do >> worse than read through data.table's vignette and FAQ and give it a >> spin. >> >> HTH, >> >> -steve >> >> -- >> Steve Lianoglou >> Computational Biologist >> Bioinformatics and Computational Biology >> Genentech >> > > [[alternative HTML version deleted]] > > > ______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Steve Lianoglou Computational Biologist Bioinformatics and Computational Biology Genentech ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

