Re: [R] using lm() with variable formula

2007-05-21 Thread Vladimir Eremeev

I was solving similar problem some time ago.
Here is my script.
I had a data frame, containing a response and several other variables, which
were assumed predictors.
I was trying to choose the best linear approximation.
This approach now seems to me useless, please, don't blame me for that.
However, the script might be useful to you.

code
library(forward)

# dfr is a data.frame, that contains everything.
# The response variable is named med5x
# The following lines construct linear models for all possibe formulas
# of the form 
# med5x~T+a+height
# med5x~a+height+RH
# T, a, RH, etc are the names of possible predictors

inputs-names(dfr)[c(10:30,1)]  # dfr was a very large data frame,
containing lot of variables.
# here we have chosen only a subset of them.

for(nc in 11:length(inputs)){ # the linear models were assumed to have at
least 11 terms
# now we are generating character vectors containing formulas.

  formulas-paste(med5x,sep=~,
 
fwd.combn(inputs,nc,fun=function(x){paste(x,collapse=+)}))

# and then, are trying to fit every

  for(f in formulas){
lms-lm(eval(parse(text=f)),data=dfr)

   
cat(file=linear_models.txt,f,sum(residuals(lms)^2),\n,sep=\t,append=TRUE)
  }
}
/code

Hmm, looking back, I see that this is rather inefficient script.
For example, the inner cycle can easily be replaced with the apply function.


Chris Elsaesser wrote:
 
 New to R; please excuse me if this is a dumb question.  I tried to RTFM;
 didn't help.
 
 I want to do a series of regressions over the columns in a data.frame,
 systematically varying the response variable and the the terms; and not
 necessarily including all the non-response columns.  In my case, the
 columns are time series. I don't know if that makes a difference; it
 does mean I have to call lag() to offset non-response terms. I can not
 assume a specific number of columns in the data.frame; might be 3, might
 be 20. 
 
 My central problem is that the formula given to lm() is different each
 time.  For example, say a data.frame had columns with the following
 headings:  height, weight, BP (blood pressure), and Cals (calorie intake
 per time frame).  In that case, I'd need something like the following:
 
   lm(height ~ weight + BP + Cals)
   lm(height ~ weight + BP)
   lm(height ~ weight + Cals)
   lm(height ~ BP + Cals)
   lm(weight ~ height + BP)
   lm(weight ~ height + Cals)
   etc.
 
 In general, I'll have to read the header to get the argument labels.
 
 Do I have to write several functions, each taking a different number of
 arguments?  I'd like to construct a string or list representing the
 varialbes in the formula and apply lm(), so to say  [I'm mainly a Lisp
 programmer where that part would be very simple. Anyone have a Lisp API
 for R? :-}]
 
 

-- 
View this message in context: 
http://www.nabble.com/using-lm%28%29-with-variable-formula-tf3772540.html#a10716815
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using lm() with variable formula [Broadcast]

2007-05-18 Thread Liaw, Andy
One way to do it is by giving a data frame with the right variables to
lm() as the first argument each time.  If lm() is given a data frame as
the first argument, it will treat the first variable as the LHS and the
rest as the RHS of the formula.

As examples, you can do:

lm(myData[c(height, weight, BP, Cals)])

(The drawback to this is that the formula in the fitted model object
looks a bit strange...)

Andy


From: Chris Elsaesser
 
 New to R; please excuse me if this is a dumb question.  I 
 tried to RTFM;
 didn't help.
 
 I want to do a series of regressions over the columns in a data.frame,
 systematically varying the response variable and the the 
 terms; and not
 necessarily including all the non-response columns.  In my case, the
 columns are time series. I don't know if that makes a difference; it
 does mean I have to call lag() to offset non-response terms. I can not
 assume a specific number of columns in the data.frame; might 
 be 3, might
 be 20. 
 
 My central problem is that the formula given to lm() is different each
 time.  For example, say a data.frame had columns with the following
 headings:  height, weight, BP (blood pressure), and Cals 
 (calorie intake
 per time frame).  In that case, I'd need something like the following:
 
   lm(height ~ weight + BP + Cals)
   lm(height ~ weight + BP)
   lm(height ~ weight + Cals)
   lm(height ~ BP + Cals)
   lm(weight ~ height + BP)
   lm(weight ~ height + Cals)
   etc.
 
 In general, I'll have to read the header to get the argument labels.
 
 Do I have to write several functions, each taking a different 
 number of
 arguments?  I'd like to construct a string or list representing the
 varialbes in the formula and apply lm(), so to say  [I'm mainly a Lisp
 programmer where that part would be very simple. Anyone have 
 a Lisp API
 for R? :-}]
 
 Thanks,
 chris
 
 Chris Elsaesser, PhD
 Principal Scientist, Machine Learning
 SPADAC Inc.
 7921 Jones Branch Dr. Suite 600  
 McLean, VA 22102  
 
 703.371.7301 (m)
 703.637.9421 (o)
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] using lm() with variable formula

2007-05-17 Thread Chris Elsaesser
New to R; please excuse me if this is a dumb question.  I tried to RTFM;
didn't help.

I want to do a series of regressions over the columns in a data.frame,
systematically varying the response variable and the the terms; and not
necessarily including all the non-response columns.  In my case, the
columns are time series. I don't know if that makes a difference; it
does mean I have to call lag() to offset non-response terms. I can not
assume a specific number of columns in the data.frame; might be 3, might
be 20. 

My central problem is that the formula given to lm() is different each
time.  For example, say a data.frame had columns with the following
headings:  height, weight, BP (blood pressure), and Cals (calorie intake
per time frame).  In that case, I'd need something like the following:

lm(height ~ weight + BP + Cals)
lm(height ~ weight + BP)
lm(height ~ weight + Cals)
lm(height ~ BP + Cals)
lm(weight ~ height + BP)
lm(weight ~ height + Cals)
etc.

In general, I'll have to read the header to get the argument labels.

Do I have to write several functions, each taking a different number of
arguments?  I'd like to construct a string or list representing the
varialbes in the formula and apply lm(), so to say  [I'm mainly a Lisp
programmer where that part would be very simple. Anyone have a Lisp API
for R? :-}]

Thanks,
chris

Chris Elsaesser, PhD
Principal Scientist, Machine Learning
SPADAC Inc.
7921 Jones Branch Dr. Suite 600  
McLean, VA 22102  

703.371.7301 (m)
703.637.9421 (o)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using lm() with variable formula

2007-05-17 Thread Gabor Grothendieck
Try this:


lm(Sepal.Length ~., iris[1:3])

# or

cn - c(Sepal.Length, Sepal.Width, Petal.Length)
lm(Sepal.Length ~., iris[cn])



On 5/17/07, Chris Elsaesser [EMAIL PROTECTED] wrote:
 New to R; please excuse me if this is a dumb question.  I tried to RTFM;
 didn't help.

 I want to do a series of regressions over the columns in a data.frame,
 systematically varying the response variable and the the terms; and not
 necessarily including all the non-response columns.  In my case, the
 columns are time series. I don't know if that makes a difference; it
 does mean I have to call lag() to offset non-response terms. I can not
 assume a specific number of columns in the data.frame; might be 3, might
 be 20.

 My central problem is that the formula given to lm() is different each
 time.  For example, say a data.frame had columns with the following
 headings:  height, weight, BP (blood pressure), and Cals (calorie intake
 per time frame).  In that case, I'd need something like the following:

lm(height ~ weight + BP + Cals)
lm(height ~ weight + BP)
lm(height ~ weight + Cals)
lm(height ~ BP + Cals)
lm(weight ~ height + BP)
lm(weight ~ height + Cals)
etc.

 In general, I'll have to read the header to get the argument labels.

 Do I have to write several functions, each taking a different number of
 arguments?  I'd like to construct a string or list representing the
 varialbes in the formula and apply lm(), so to say  [I'm mainly a Lisp
 programmer where that part would be very simple. Anyone have a Lisp API
 for R? :-}]

 Thanks,
 chris

 Chris Elsaesser, PhD
 Principal Scientist, Machine Learning
 SPADAC Inc.
 7921 Jones Branch Dr. Suite 600
 McLean, VA 22102

 703.371.7301 (m)
 703.637.9421 (o)

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using lm() with variable formula

2007-05-17 Thread Richard M. Heiberger
 tmp - data.frame(matrix(rnorm(40),10,4, dimnames=list(NULL, 
 c(Y,A,B,C
 tmp
 tmp.form -  paste(names(tmp)[1], paste(names(tmp)[-1], collapse= + ), 
 sep= ~ )
 tmp.form
 lm(tmp.form, tmp)

The R language is powerful enough to most of the lisp-like things
you may want to do.

Rich

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using lm() with variable formula

2007-05-17 Thread Bert Gunter
... and note that if a matrix of responses is on the left of ~ , separate
regressions will be simultaneously fit to each of the columns of the matrix.
Note that this **is** in TFM -- ?lm.


Bert Gunter
Genentech Nonclinical Statistics

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Gabor Grothendieck
Sent: Thursday, May 17, 2007 8:22 AM
To: Chris Elsaesser
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] using lm() with variable formula

Try this:


lm(Sepal.Length ~., iris[1:3])

# or

cn - c(Sepal.Length, Sepal.Width, Petal.Length)
lm(Sepal.Length ~., iris[cn])



On 5/17/07, Chris Elsaesser [EMAIL PROTECTED] wrote:
 New to R; please excuse me if this is a dumb question.  I tried to RTFM;
 didn't help.

 I want to do a series of regressions over the columns in a data.frame,
 systematically varying the response variable and the the terms; and not
 necessarily including all the non-response columns.  In my case, the
 columns are time series. I don't know if that makes a difference; it
 does mean I have to call lag() to offset non-response terms. I can not
 assume a specific number of columns in the data.frame; might be 3, might
 be 20.

 My central problem is that the formula given to lm() is different each
 time.  For example, say a data.frame had columns with the following
 headings:  height, weight, BP (blood pressure), and Cals (calorie intake
 per time frame).  In that case, I'd need something like the following:

lm(height ~ weight + BP + Cals)
lm(height ~ weight + BP)
lm(height ~ weight + Cals)
lm(height ~ BP + Cals)
lm(weight ~ height + BP)
lm(weight ~ height + Cals)
etc.

 In general, I'll have to read the header to get the argument labels.

 Do I have to write several functions, each taking a different number of
 arguments?  I'd like to construct a string or list representing the
 varialbes in the formula and apply lm(), so to say  [I'm mainly a Lisp
 programmer where that part would be very simple. Anyone have a Lisp API
 for R? :-}]

 Thanks,
 chris

 Chris Elsaesser, PhD
 Principal Scientist, Machine Learning
 SPADAC Inc.
 7921 Jones Branch Dr. Suite 600
 McLean, VA 22102

 703.371.7301 (m)
 703.637.9421 (o)

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.