Re: [R] Practical Data Limitations with R

2008-04-08 Thread ajay ohri
Dear Jeff,

R works fine for 22 rows that i tested on a home PC with XP . Memory is
limited to hardware that you have. I suggest beefing up RAM to 2 GB and hard
disk space and then  working it out. I evaluated R too on my site
www.decisionstats.com and I found it comparable if not better to SPSS , SAS.


As a beginner , and in corporate projects try using the *GUI* R Commander or
the *Data Mining GUI Rattle *, its faster and will help you skip some steps,
you can also look at code generated side by side to learn the
language.

I am not sure on the server client version, but that should work too ..

Also look at the book http://oit.utk.edu/scc/RforSASSPSSusers.pdf

that helps you as a reference guide. Rest of details are on my site
www.decisionstats.com

Also *try the software WPS* http://www.teamwpc.co.uk/products/wps, which
uses SAS language and provides the same functionality at 10-20 % of cost for
millions of rows.


Hope this helps,

Ajay


On Tue, Apr 8, 2008 at 7:56 PM, Jeff Royce [EMAIL PROTECTED] wrote:

 We are new to R and evaluating if we can use it for a project we need to
 do.  We have read that R is not well suited to handle very large data
 sets.  Assuming we have the data prepped and stored in an RDBMS (Oracle,
 Teradata, SQL Server), what can R reasonably handle from a volume
 perspective?   Are there some guidelines on memory/machine sizing based
 on data volume?  We need to be able to handle Millions of Rows from
 several sources.  Any advice is much appreciated.  Thanks.

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Practical Data Limitations with R

2008-04-08 Thread Philipp Pagel
On Tue, Apr 08, 2008 at 09:26:22AM -0500, Jeff Royce wrote:
 We are new to R and evaluating if we can use it for a project we need to
 do.  We have read that R is not well suited to handle very large data
 sets.  Assuming we have the data prepped and stored in an RDBMS (Oracle,
 Teradata, SQL Server), what can R reasonably handle from a volume
 perspective?   Are there some guidelines on memory/machine sizing based
 on data volume?  We need to be able to handle Millions of Rows from
 several sources.

As so often the answer is it depends. R does not have an inherent
maximum number of rows it can deal with - the available memory
determines how big a dataset you can fit into RAM. So often the answer
would be yes - just buy more RAM.

A couple million rows are no problem at all if you don't have too many
columns (done that). If you realy have a very large set of data which
you cannot fit into memory, you may still be able to use R: Do you
really need ALL data in memory at the same time? Often, very large
datasets actually contain many different subsets of data which you want
to analyze separately, anyway. The solution of storing the full data
in an RDBMS and selecting the required subsets as needed is the best
solution.

In your situation, I would simply load the full dataset into R and see
what happens.

cu
Philipp

-- 
Dr. Philipp Pagel  Tel.  +49-8161-71 2131
Lehrstuhl für Genomorientierte Bioinformatik   Fax.  +49-8161-71 2186
Technische Universität München
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
 
 and
 
Institut für Bioinformatik und Systembiologie / MIPS
Helmholtz Zentrum München -
Deutsches Forschungszentrum für Gesundheit und Umwelt
Ingolstädter Landstrasse 1
85764 Neuherberg, Germany
http://mips.gsf.de/staff/pagel

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Practical Data Limitations with R

2008-04-08 Thread hadley wickham
 We are new to R and evaluating if we can use it for a project we need to
  do.  We have read that R is not well suited to handle very large data
  sets.  Assuming we have the data prepped and stored in an RDBMS (Oracle,
  Teradata, SQL Server), what can R reasonably handle from a volume
  perspective?   Are there some guidelines on memory/machine sizing based
  on data volume?  We need to be able to handle Millions of Rows from
  several sources.  Any advice is much appreciated.  Thanks.

The most important thing is what type of analysis do you want to do
with the data?  Is the algorithm that implements the analysis O(n),
O(n log n) or O(n^2) ?

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Practical Data Limitations with R

2008-04-08 Thread Sankalp Upadhyay
Millions of rows can be a problem if all is loaded into memory, 
depending on type of data. Numeric should be fine but if you have 
strings and you would want to process based on that column (string 
comparisons etc) then it would be slow.
You may want to combine sources outside - stored procedures maybe - and 
then load to R. Joining of data within R code can be costly if you are 
selecting from a data frame based on a string.
I have, personally, run into 'out of memory' problems only beyond 1G of 
data on a windows 32 bit system with 3 GB RAM. That happens with C++ also.
Regarding speed, I find MATLAB faster than R for matrix operations. In 
other areas they are in same range. R is much better to program as it is 
has a much more complete programming language.
R can use multiple cores / cpus with a suitable multi threaded linear 
algebra library. Though this will only be for linear algebra operations.
64 bit binary for R is not available for windows.

Sankalp


Jeff Royce wrote:
 We are new to R and evaluating if we can use it for a project we need to
 do.  We have read that R is not well suited to handle very large data
 sets.  Assuming we have the data prepped and stored in an RDBMS (Oracle,
 Teradata, SQL Server), what can R reasonably handle from a volume
 perspective?   Are there some guidelines on memory/machine sizing based
 on data volume?  We need to be able to handle Millions of Rows from
 several sources.  Any advice is much appreciated.  Thanks.  

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.