Right now, linreg.c, regression.q and glm.q won't handle large data sets very well. The problem is that the regression and (currently fetal) glm procedure store the entire data set in memory, then pass the data to pspp_linreg () which finds the least squares estimates.
Storing the entire data set in memory isn't necessary, just easier to code. PSPP could handle much bigger data sets if, in the casereader_read loop, it computed two matrix products from the data in a single pass, then sent that, much smaller, information to pspp_linreg(). But there may be tasks for which pspp_linreg () should accept all the data as a single matrix, so it should probably be able to do that, too. My question is: Should I do this now, or wait until after the release? It will probably change a lot of code in linreg.c, and could introduce several bugs. The benefit would be to make any procedure that needs regression able to run with very large data sets. -Jason _______________________________________________ pspp-dev mailing list [email protected] http://lists.gnu.org/mailman/listinfo/pspp-dev
