On Tue, 16 May 2006, Robert Citek wrote: > > On May 16, 2006, at 8:15 AM, justin bem wrote: > >> Try to open your db with MySQL and use RMySQL > > I've seen this offered up as a suggestion a few times but with little > detail. In my experience, even using SQL to pull in data from a > MySQL DB, R would need to load the entire data set into RAM before > doing some calculations. But perhaps I'm using RMySQL incorrectly[1]. > > As a toy problem, let's imagine a data set (foo) with a single > numerical field (bar) and 1 billion records (1e9). In MySQL one > would do the following to calculate the mean: > > select avg(bar) from foo ; > > For a smaller data set I would issue a select statement and then > fetch the entire set into a data frame before calculating the mean. > Given such a large data set, how would one calculate the mean using R > connected to this MySQL database? How would one calculate the median > using R connected to this MySQL database? > > Pointers to references appreciated.
Well, there *is* a manual about R Data Import/Export, and this does discuss using R with DBMSs with examples. How about reading it? The point being made is that you can import just the columns you need, and indeed summaries of those columns. > [1] http://www.sourcekeg.co.uk/cran/src/contrib/Descriptions/RMySQL.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
