Re: [R] "lean and mean" regression {was "Memory size"}

Spencer Graves Mon, 14 Jul 2003 08:13:52 -0700

Dear Silika:

Do you know what makes the memory requirements so large? Do you have many observations, or is it (as Martin just suggested) "several factors with many levels"? If the latter, and if you have not already done this, I suggest you think very carefully if you want all those (unordered) level. If you have many levels with only one observation per level, then I suggest you first just delete those observations. You'd get residuals == 0 for those observations, anyway, and you can just as well handle that part of the problem manually. If you have many levels with more than 1 but still very few observations per level, appropriate preparation for the regression might be to convert unordered factor levels to an ordinal scale then to numerics and than regress on a low-order polynomial in the made-up scale. That's old technology but can still be quite useful.

In some applications, science progresses like this: Unordered categories get ordered then transformed to an ordinal scale, then to a quantitative scale. Checking for outliers might reveal misplaced levels.

hope this helps. spencer graves

Martin Maechler wrote:

"AndyL" == Liaw, Andy <[EMAIL PROTECTED]>
   on Mon, 14 Jul 2003 09:33:31 -0400 writes:
AndyL> How *exactly* did you "run the regression" in R? AndyL> There are several ways, and it can make a big AndyL> difference for large data sets. lm() would be the AndyL> most expensive option. If I'm not mistaken, lsfit() AndyL> is more "lean and mean". as a matter of fact, rather use lm.fit() which is the ``work horse'' of lm(). lm.fit() and lsfit() are very similar (relying on the same Fortran QR decomposition, but lm.fit() has now been tested {by lm() usage} much more extensively.
    AndyL> You can even do it more or less by hand, by calling
    AndyL> qr() directly.  There's also a disussion in Venables
    AndyL> & Ripley's "S Programming" on this subject (for Splus).
Section 7.2, (actually the relevant code is not at all S-plus specific, just the final "resources(.)" [CPU, Memory] measuring of the solution.) It's for the case of one factor with many (107!)levels and continuous covariates otherwise. There, one can solve without constructing the large matrices that all of lm(), lsfit() or lm.fit() would use.
It becomes really "interesting" if you have (several) factors
with (many) levels...
Regards,
Martin
>> -----Original Message----- From: Silika Tereshchenko >> [mailto:[EMAIL PROTECTED] Sent: Sunday, July 13, >> 2003 8:55 AM To: [EMAIL PROTECTED] Subject: [R] >> Memory size >> >> >> >> Daer all, >> >> I have the problem. I could not run the regression, >> because I have always the warning message >> "memory.size". from the help file I learned that it is >> possible to increase the memory size, but I did not >> undestand how could I do it. Could you please explaine it >> to me. I would be very grateful for it. >> >> >> The second question: I obtained from the regression the >> coefficient "6.003e-3" and "0.0345e+3". What daos it >> mean? >> >> >> >> Thanks a lot, Silika >> >> ______________________________________________ >> [EMAIL PROTECTED] mailing list >> https://www.stat.math.ethz.ch/mailman/listinfo> /r-help >>
    AndyL> 
------------------------------------------------------------------------------
    AndyL> Notice: This e-mail message, together with any
    AndyL> attachments, ...{{dropped}}
    AndyL> ______________________________________________
    AndyL> [EMAIL PROTECTED] mailing list
    AndyL> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] "lean and mean" regression {was "Memory size"}

Reply via email to