On Tue, Apr 08, 2008 at 09:26:22AM -0500, Jeff Royce wrote: > We are new to R and evaluating if we can use it for a project we need to > do. We have read that R is not well suited to handle very large data > sets. Assuming we have the data prepped and stored in an RDBMS (Oracle, > Teradata, SQL Server), what can R reasonably handle from a volume > perspective? Are there some guidelines on memory/machine sizing based > on data volume? We need to be able to handle Millions of Rows from > several sources.
As so often the answer is "it depends". R does not have an inherent maximum number of rows it can deal with - the available memory determines how big a dataset you can fit into RAM. So often the answer would be "yes - just buy more RAM". A couple million rows are no problem at all if you don't have too many columns (done that). If you realy have a very large set of data which you cannot fit into memory, you may still be able to use R: Do you really need ALL data in memory at the same time? Often, very large datasets actually contain many different subsets of data which you want to analyze separately, anyway. The solution of storing the full data in an RDBMS and selecting the required subsets as needed is the best solution. In your situation, I would simply load the full dataset into R and see what happens. cu Philipp -- Dr. Philipp Pagel Tel. +49-8161-71 2131 Lehrstuhl für Genomorientierte Bioinformatik Fax. +49-8161-71 2186 Technische Universität München Wissenschaftszentrum Weihenstephan 85350 Freising, Germany and Institut für Bioinformatik und Systembiologie / MIPS Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt Ingolstädter Landstrasse 1 85764 Neuherberg, Germany http://mips.gsf.de/staff/pagel ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.