Thanks to Greg for a nice solution to the question posed by Markus.
Now I am going to complicate things a bit...
what if besides the regression coefficients (b) I have also their
associated standard errors (b+/-se)?
Is it possible to generate data which,in a multivariate regression,
will yeld not only predefinite r^2 and b values but also their
associated predefinite s.e. values?
thanks for your input!
achaz
Dr. Achaz von Hardenberg
------------------------------------------------------------------------
--------------------------------
Centro Studi Fauna Alpina - Alpine Wildlife Research Centre
Servizio Sanitario e della Ricerca Scientifica
Parco Nazionale Gran Paradiso, Degioz, 11, 11010-Valsavarenche (Ao),
Italy
E-mail: [EMAIL PROTECTED]
[EMAIL PROTECTED]
Skype: achazhardenberg
Tel.: +39.0165.905783
Fax: +39.0165.905506
Mobile: +39.328.8736291
------------------------------------------------------------------------
--------------------------------
On 19 Nov 2008, at 17:29, Greg Snow wrote:
Try this:
# generate x's
x1 <- sample(100, 100, TRUE)
x2 <- sample(100, 100, TRUE)
# generate yhat with b0=1, b1=2, b2=3
yhat <- 1 + 2*x1 + 3*x2
# compute ssr
ssr <- sum( (yhat-mean(yhat))^2 )
# generate errors
e <- rnorm(100)
e <- resid( lm( e ~ x1 + x2 ) )
# to get R^2 of 0.8, ssr/(ssr+sse)=0.8 so sse=0.2/0.8*ssr
e <- e* sqrt(0.2/0.8*ssr/(sum(e^2)))
# now for y
y <- yhat + e
# put into a data frame and test
mydata <- data.frame( y=y, x1=x1, x2=x2 )
fit <- lm(y ~ x1 + x2, data=mydata )
summary(fit)
Now just change the values that you want changed to match your
situation. It does not matter how the x's are generated, so
include more, include polynomials, include interactions, etc.
Hope this helps,
--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
801.408.8111
-----Original Message-----
From: [EMAIL PROTECTED] [mailto:r-sig-teaching-
[EMAIL PROTECTED] On Behalf Of markus
Sent: Wednesday, November 19, 2008 1:19 AM
To: [email protected]
Subject: [R-sig-teaching] Simulating Data with predefined reg-
coefficients and R2
Hi all at the R-teaching mailing list,
I am currently preparing my first R-based regression course. Along
this way I encountered the following problem:
I want to simulate multivariate data that has some specific
predefined
attributes. For example I want to produce a Predictor-matrix (X)
and a response-vector (y) that will yield a given vector of
regression
coefficients (b) and a given R2 when I perform a multivariate linear
Regression
on the dataset. This would be best described by the well known
equation
y=X*b+e.
In some next step I also want to simulate polynomic relationships,
but
I
think that should work not very different.
I already searched the web and found some hints, but no clear answer.
There is a pdf out there from John H. Walker (Teaching Regression
with
simulation)
which does however not discuss this special topic. I also have a
Paper
from K.Baumann 'Chance Correlation in variable subset regression:
Influence of the objective function, selection mechanism and Ensemble
averaging' QCS, 2005. There an 'Autoregressive process' is used to
simulate such data.
Now my question is:
Is it really that difficult to simulate such data? Is there perhaps a
package in R facilitating at least parts of this work?
Thanks in advance for the help,
Markus
_______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-teaching
_______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-teaching
--
This message was scanned by ESVA and is believed to be clean.
_______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-teaching