regression and glm with big data

Jason Stover Tue, 14 Aug 2007 19:46:08 -0700

Right now, linreg.c, regression.q and glm.q won't handle large data
sets very well. The problem is that the regression and (currently
fetal) glm procedure store the entire data set in memory, then pass
the data to pspp_linreg () which finds the least squares estimates.


Storing the entire data set in memory isn't necessary, just easier to
code. PSPP could handle much bigger data sets if, in the
casereader_read loop, it computed two matrix products from the data in
a single pass, then sent that, much smaller, information to
pspp_linreg().

But there may be tasks for which pspp_linreg () should accept all the
data as a single matrix, so it should probably be able to do that,
too.

My question is: Should I do this now, or wait until after the release?
It will probably change a lot of code in linreg.c, and could introduce
several bugs. The benefit would be to make any procedure that needs
regression able to run with very large data sets.

-Jason


_______________________________________________
pspp-dev mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/pspp-dev

regression and glm with big data

Reply via email to