On May 16, 2006, at 8:15 AM, justin bem wrote: > Try to open your db with MySQL and use RMySQL
I've seen this offered up as a suggestion a few times but with little detail. In my experience, even using SQL to pull in data from a MySQL DB, R would need to load the entire data set into RAM before doing some calculations. But perhaps I'm using RMySQL incorrectly[1]. As a toy problem, let's imagine a data set (foo) with a single numerical field (bar) and 1 billion records (1e9). In MySQL one would do the following to calculate the mean: select avg(bar) from foo ; For a smaller data set I would issue a select statement and then fetch the entire set into a data frame before calculating the mean. Given such a large data set, how would one calculate the mean using R connected to this MySQL database? How would one calculate the median using R connected to this MySQL database? Pointers to references appreciated. [1] http://www.sourcekeg.co.uk/cran/src/contrib/Descriptions/RMySQL.html Regards, - Robert http://www.cwelug.org/downloads Help others get OpenSource software. Distribute FLOSS for Windows, Linux, *BSD, and MacOS X with BitTorrent ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
