Oliver,

I have thought of adding something like this to a package, but here is my 
current thinking on the issue.

This question (or similar) has been asked a few times, so there is some demand 
for a general answer, I see three approaches:

1. Have an example of the necessary steps archived in a publicly available 
place.
2. Write a function and include it in a non-core package.
3. Add it to the core of R or a core package.

Number 1 is already in process as the e-mails will be part of the archive.  
Though someone is welcome to add it to the Wiki if they think that would be 
useful as well.

Your suggestion is number 3, but I would argue that 2 is better than 3 for the 
simple reason that anything added to the core is implied to be top quality and 
have pretty much any options that most people would think of.  Putting it in a 
non-core package makes it available, with less implications of quality.

The question then becomes, what options do we make available?  Do we have them 
specify the entire correlation structure? Or just assume the new variables will 
be independent of each other?  What should the function do if the set of 
correlations result in a matrix that is not positive definite?  What if the 
user wants to have 2 fixed variables?  And other questions.

My current thinking is that the process is simple enough that it is easier to 
do this by hand than to remember all the options to the function.  There are 
currently people who use bootstrap and permutation tests without loading in the 
packages that do these because it is quicker to write the code by hand than to 
remember the syntax of the functions.  I think this type of data generation 
falls under the same situation.  But if you, or someone else thinks that there 
is enough justification for a function to do this, and can specify what options 
it should have, I will be happy to add it to my TeachingDemos package (this 
seems an appropriate place, since one of the places that I want to generate 
data with a specific correlation structure is when creating an example for 
students).


Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111
 
 

> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of 
> Olivier ETERRADOSSI
> Sent: Thursday, April 05, 2007 3:09 AM
> To: [email protected]
> Subject: Re: [R] Generate a serie of new vars that correlate 
> with existingvar
> 
> Hello, list
> why not add the smart proposal by Greg Snow as a built-in 
> function in {stats}, just changing the "x234" and "newc" 
> lines to allow for more distributions to be generated ?
> Or do I miss an already existing function to do that ?
> Regards. Olivier
> 
> 
> # slight modification of the original code by Greg Snow 
> [mailto:[EMAIL PROTECTED]
> # on April 04, 2007 1:46 AM
> 
> # generates ndistr vectors of same mean and sd, with various 
> cor.coeffs # input :
> #         x1 : a vector
> #         ndistr : number of distributions
> #         coefs : vector o ndistr correl. coeffs
> 
> CorelSets<-function(x1= rnorm(100, 15, 5),ndistr=3, 
> coefs=c(0.4,0.5,0.6)){
> 
> # x2, x3, and x4 in a matrix, these will be modified to meet 
> the criteria
> x234 <- scale(matrix( rnorm(ndistr*length(x1)), ncol=ndistr ))
> 
> # put all into 1 matrix for simplicity
> x1234 <- cbind(scale(x1),x234)
> 
> # find the current correlation matrix
> c1 <- var(x1234)
> 
> # cholesky decomposition to get independence
> chol1 <- solve(chol(c1))
> 
> newx <-  x1234 %*% chol1
> 
> # check that we have independence and x1 unchanged
> zapsmall(cor(newx))
> all.equal( x1234[,1], newx[,1] )
> 
> # create new correlation structure
> newc<-diag(ndistr+1)
> newc[1,-1]<- coefs
> newc[-1,1]<- coefs
> 
> chol2 <- chol(newc)
> 
> finalx <- newx %*% chol2 * sd(x1) + mean(x1)
> pairs(finalx)
> CorelSets<-finalx
> }
> > Message-ID: <[EMAIL PROTECTED]>
> > Content-Type: text/plain;   charset="us-ascii"
> >
> > Dear Greg,
> > Thanks million!
> > "As good as it gets"  :)
> > All the best
> > Nguyen
> >
> > -----Original Message-----
> > From: Greg Snow [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, April 04, 2007 1:46 AM
> > To: Nguyen Dinh Nguyen; [email protected]
> > Subject: RE: [R] Generate a serie of new vars that correlate with 
> > existing var
> >
> > Here is one way to do 
> it:......8<.................snip.........8<....
> >   
> --
> Olivier ETERRADOSSI
> Maître-Assistant
> CMGD / Equipe "Propriétés Psycho-Sensorielles des Matériaux"
> Ecole des Mines d'Alès
> Hélioparc, 2 av. P. Angot, F-64053 PAU CEDEX 9 tel std: +33 
> (0)5.59.30.54.25 tel direct: +33 (0)5.59.30.90.35
> fax: +33 (0)5.59.30.63.68
> http://www.ema.fr
> 
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to