Re: [R] Practical Data Limitations with R
Dear Jeff, R works fine for 22 rows that i tested on a home PC with XP . Memory is limited to hardware that you have. I suggest beefing up RAM to 2 GB and hard disk space and then working it out. I evaluated R too on my site www.decisionstats.com and I found it comparable if not better to SPSS , SAS. As a beginner , and in corporate projects try using the *GUI* R Commander or the *Data Mining GUI Rattle *, its faster and will help you skip some steps, you can also look at code generated side by side to learn the language. I am not sure on the server client version, but that should work too .. Also look at the book http://oit.utk.edu/scc/RforSASSPSSusers.pdf that helps you as a reference guide. Rest of details are on my site www.decisionstats.com Also *try the software WPS* http://www.teamwpc.co.uk/products/wps, which uses SAS language and provides the same functionality at 10-20 % of cost for millions of rows. Hope this helps, Ajay On Tue, Apr 8, 2008 at 7:56 PM, Jeff Royce [EMAIL PROTECTED] wrote: We are new to R and evaluating if we can use it for a project we need to do. We have read that R is not well suited to handle very large data sets. Assuming we have the data prepped and stored in an RDBMS (Oracle, Teradata, SQL Server), what can R reasonably handle from a volume perspective? Are there some guidelines on memory/machine sizing based on data volume? We need to be able to handle Millions of Rows from several sources. Any advice is much appreciated. Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Practical Data Limitations with R
On Tue, Apr 08, 2008 at 09:26:22AM -0500, Jeff Royce wrote: We are new to R and evaluating if we can use it for a project we need to do. We have read that R is not well suited to handle very large data sets. Assuming we have the data prepped and stored in an RDBMS (Oracle, Teradata, SQL Server), what can R reasonably handle from a volume perspective? Are there some guidelines on memory/machine sizing based on data volume? We need to be able to handle Millions of Rows from several sources. As so often the answer is it depends. R does not have an inherent maximum number of rows it can deal with - the available memory determines how big a dataset you can fit into RAM. So often the answer would be yes - just buy more RAM. A couple million rows are no problem at all if you don't have too many columns (done that). If you realy have a very large set of data which you cannot fit into memory, you may still be able to use R: Do you really need ALL data in memory at the same time? Often, very large datasets actually contain many different subsets of data which you want to analyze separately, anyway. The solution of storing the full data in an RDBMS and selecting the required subsets as needed is the best solution. In your situation, I would simply load the full dataset into R and see what happens. cu Philipp -- Dr. Philipp Pagel Tel. +49-8161-71 2131 Lehrstuhl für Genomorientierte Bioinformatik Fax. +49-8161-71 2186 Technische Universität München Wissenschaftszentrum Weihenstephan 85350 Freising, Germany and Institut für Bioinformatik und Systembiologie / MIPS Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt Ingolstädter Landstrasse 1 85764 Neuherberg, Germany http://mips.gsf.de/staff/pagel __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Practical Data Limitations with R
We are new to R and evaluating if we can use it for a project we need to do. We have read that R is not well suited to handle very large data sets. Assuming we have the data prepped and stored in an RDBMS (Oracle, Teradata, SQL Server), what can R reasonably handle from a volume perspective? Are there some guidelines on memory/machine sizing based on data volume? We need to be able to handle Millions of Rows from several sources. Any advice is much appreciated. Thanks. The most important thing is what type of analysis do you want to do with the data? Is the algorithm that implements the analysis O(n), O(n log n) or O(n^2) ? Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Practical Data Limitations with R
Millions of rows can be a problem if all is loaded into memory, depending on type of data. Numeric should be fine but if you have strings and you would want to process based on that column (string comparisons etc) then it would be slow. You may want to combine sources outside - stored procedures maybe - and then load to R. Joining of data within R code can be costly if you are selecting from a data frame based on a string. I have, personally, run into 'out of memory' problems only beyond 1G of data on a windows 32 bit system with 3 GB RAM. That happens with C++ also. Regarding speed, I find MATLAB faster than R for matrix operations. In other areas they are in same range. R is much better to program as it is has a much more complete programming language. R can use multiple cores / cpus with a suitable multi threaded linear algebra library. Though this will only be for linear algebra operations. 64 bit binary for R is not available for windows. Sankalp Jeff Royce wrote: We are new to R and evaluating if we can use it for a project we need to do. We have read that R is not well suited to handle very large data sets. Assuming we have the data prepped and stored in an RDBMS (Oracle, Teradata, SQL Server), what can R reasonably handle from a volume perspective? Are there some guidelines on memory/machine sizing based on data volume? We need to be able to handle Millions of Rows from several sources. Any advice is much appreciated. Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.