Millions of rows can be a problem if all is loaded into memory,
depending on type of data. Numeric should be fine but if you have
strings and you would want to process based on that column (string
comparisons etc) then it would be slow.
You may want to combine sources outside - stored procedures maybe - and
then load to R. Joining of data within R code can be costly if you are
selecting from a data frame based on a string.
I have, personally, run into 'out of memory' problems only beyond 1G of
data on a windows 32 bit system with 3 GB RAM. That happens with C++ also.
Regarding speed, I find MATLAB faster than R for matrix operations. In
other areas they are in same range. R is much better to program as it is
has a much more complete programming language.
R can use multiple cores / cpus with a suitable multi threaded linear
algebra library. Though this will only be for linear algebra operations.
64 bit binary for R is not available for windows.
Sankalp
Jeff Royce wrote:
We are new to R and evaluating if we can use it for a project we need to
do. We have read that R is not well suited to handle very large data
sets. Assuming we have the data prepped and stored in an RDBMS (Oracle,
Teradata, SQL Server), what can R reasonably handle from a volume
perspective? Are there some guidelines on memory/machine sizing based
on data volume? We need to be able to handle Millions of Rows from
several sources. Any advice is much appreciated. Thanks.
[[alternative HTML version deleted]]
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.