On Fri, Aug 17, 2007 at 01:53:25PM -0400, Ravi Varadhan wrote: > The simplest trick is to use the QR decomposition: > > The OLS solution (X'X)^{-1}X'y can be easily computed as: > qr.solve(X, y)
While I agree that this is the correct way to solve the linear algebra problem, I seem to be missing the reason why re-inventing the existing lm function (which undoubtedly uses a QR decomposition internally) will solve the problem that was mentioned, namely the massive amount of memory that the process consumes? 2e6 rows by 200 columns by 8 bytes per double = 3 gigs minimum memory consumption. The QR decomposition process, or any other solving process will at least double this to 6 gigs, and it would be unsurprising to have the overhead cause the whole thing to reach 8 gigs at the peak memory usage. I'm going to assume that the original user has perhaps 1.5 gigs to 2 gigs available, so any process that even READS IN a matrix of more than about 1 million rows will exceed the available memory. Hence, my suggestion to randomly downsample the matrix by a factor of 10, and then bootstrap the coefficients by repeating the downsampling process 20, 50, or 100 times to take advantage of all of the data available. Now that I'm aware of the biglm package, I think that it is probably preferrable. -- Daniel Lakeland [EMAIL PROTECTED] http://www.street-artists.org/~dlakelan ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.