On Thu, 13 Apr 2006, roger koenker wrote: > Jeff, > > I don't know whether this is likely to be feasible, but if you could > replace calls to lm() with calls to a sparse matrix version of lm() > either slm() in SparseM or something similar in Matrix, then I > would think that you should safe from memory problems. Adapting step > might be more than you really bargained for though, I don't > know the code....
It's a simple wrapper that has been used for many model-fitting classes. All you need is an extractAIC method. > > Roger > > url: www.econ.uiuc.edu/~roger Roger Koenker > email [EMAIL PROTECTED] Department of Economics > vox: 217-333-4558 University of Illinois > fax: 217-244-6678 Champaign, IL 61820 > > > On Apr 13, 2006, at 2:41 PM, Jeffrey Racine wrote: > >> Hi. >> >> Background - I am working with a dataset involving around 750K >> observations, where many of the variables (8/11) are unordered >> factors. >> >> The typical model used to model this relationship in the literature >> has >> been a simple linear additive model, but this is rejected out of >> hand by >> the data. I was asked to model this via kernel methods, but first >> wanted >> to play with the parametric specification out of curiosity. >> >> I thought it would be interesting to see what type of model >> stepwise BIC >> would yield, and have been playing with the step() function (on R-beta >> due to the factor.scope() problem that has been fixed in the >> patched and >> beta version). >> >> I am running this on a 64bit box with 32GB of RAM and tons of swap, >> but >> am hitting the memory wall as occasionally memory needs grow to >> ungodly >> proportions (in the early iterations the program starts out around 8GB >> but quickly grows to 15GB, then grows from there). This is not due >> to my >> using the beta version, as this also arises under R-2.2.1 for what >> that >> is worth. >> >> My question is whether or not there is some simple way to >> substantially >> reduce the memory footprint for this procedure. I took a look at >> previous posts for step() and memory issues, but still wonder whether >> there might be a switch or possibly better way of constructing my >> model >> that would overcome the memory issues. >> >> I include the code below, and any comments or suggestions would be >> most >> welcome (besides `what type of idiot lets information criteria >> determine >> their model ;-)') >> >> Thanks ever so much in advance. >> >> -- Jeff >> >> ---- Begin ---- >> >> ## Read in the full data set (n=745466 observations) >> >> data <- read.table("../data_header.dat",header=TRUE) >> >> ## Create a data frame with all categorical variables declared as >> ## unordered factors >> >> data <- data.frame(logrprice=data$logrprice, >> cgt=factor(data$cgt), >> cag=factor(data$cag), >> gstann=factor(data$gstann), >> fhogann=factor(data$fhogann), >> gstfhog=factor(data$gstfhog), >> luc=factor(data$luc), >> municipality=factor(data$municipality), >> time=factor(data$time), >> distance=data$distance, >> logr=data$logr, >> loginc=data$loginc) >> >> ## Estimate a simple linear model (used repeatedly in the literature, >> ## fails the most simple of model specification tests e.g., >> ## resettest()) >> >> model.linear <- lm(logrprice~.,data=data) >> >> ## Now conduct stepwise (BIC) regression using the step() function in >> ## the stats library. The lower model is the unconditional mean of y, >> ## the upper having polynomials of up to order 6 in the three >> ## continuous covariates, with interaction among all variables of >> ## order 2. >> >> n <- nrow(data) >> >> model.bic <- step(model.linear, >> scope=list( >> lower=~ 1, >> upper=~ (. >> +I(logr^2) >> +I(logr^3) >> +I(logr^4) >> +I(logr^5) >> +I(logr^6) >> +I(distance^2) >> +I(distance^3) >> +I(distance^4) >> +I(distance^5) >> +I(distance^6) >> +I(loginc^2) >> +I(loginc^3) >> +I(loginc^4) >> +I(loginc^5) >> +I(loginc^6)) >> ^2), >> trace=TRUE, >> k=log(n) >> ) >> >> summary(model.bic) >> >> ---- End ---- >> -- >> Professor J. S. Racine Phone: (905) 525 9140 x 23825 >> Department of Economics FAX: (905) 521-8232 >> McMaster University e-mail: [EMAIL PROTECTED] >> 1280 Main St. W.,Hamilton, URL: >> http://www.economics.mcmaster.ca/racine/ >> Ontario, Canada. L8S 4M4 >> >> `The generation of random numbers is too important to be left to >> chance.' >> >> ______________________________________________ >> R-help@stat.math.ethz.ch mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide! http://www.R-project.org/posting- >> guide.html > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html