Re: [R] Generate multivariate normal data with a random correlation matrix

2011-02-10 Thread Rick DeShon
Thanks for the response, Rex.  This is an interesting approach.  The
Choleski decomposition approach that John suggested seems to be an
obvious and direct approach to this problem.  Your approach is less
obvious to me but may be equal or superior to the Choleski
decomposition.

Are all possible correlation matrices of size k equally likely using
your approach?  It would seem so based on your description.  If so, it
is a way cool solution.

Rick




On Thu, Feb 10, 2011 at 12:18 PM,  rex.dw...@syngenta.com wrote:
 If you want a random correlation matrix, why not just generate random data 
 and accept the correlation matrix that you get?  The standard normal 
 distribution in k dimensions is (hyper)spherically symmetric.  If you 
 generate k standard normal N(0,1) variates, you have a point in k-space with 
 direction uniformly distributed on the (k-1)sphere and Gaussian magnitude.  
 If you generate k such, you have a random linear transformation with all 
 sorts of desirable symmetries.  So, if you generate a kxk matrix of standard 
 normal variates, and another nxk standard normal variates, and multiply the 
 two matrices to get n points in k space, that seems to be a pretty good 
 definition of random correlation to me.  I'm sure you can decompose the kxk 
 matrix to get the theoretical distribution, maybe by multiplying it by its 
 transpose and doing an SVD; I'd have to think about that part.
 ... unless you have a particular distribution of correlation matrices in mind 
 to begin with, which doesn't seem to be the case.


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf Of Szumiloski, John
 Sent: Wednesday, February 09, 2011 11:30 AM
 To: r-help@r-project.org
 Cc: Rick DeShon
 Subject: Re: [R] Generate multivariate normal data with a random correlation 
 matrix

 The knee jerk thought I had was to express the correlation matrix as a 
 generic Choleski decomposition, then randomly populate the triangular 
 decomposed matrix.  When you remultiply, you can simply rescale to 1s on the 
 diagonals.  Then rmnorm as usual.

 In R, see ?chol

 If you want to get fancy, you could look at the random distribution you would 
 use for the triangular matrix and play with that, including different 
 distributions for different elements, elements' distributions being 
 conditional on values of previously randomized elements, etc.

 John

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf Of Rick DeShon
 Sent: Wednesday, 09 February, 2011 11:06 AM
 To: r-h...@stat.math.ethz.ch
 Subject: [R] Generate multivariate normal data with a random correlation 
 matrix

 Hi All.

 I'd like to generate a sample of n observations from a k dimensional 
 multivariate normal distribution with a random correlation matrix.

 My solution:
 The lower (or upper) triangle of the correlation matrix has 
 n.tri=(d/2)(d+1)-d entries.
 Take a uniform sample of n.tri possible correlations (runi(n.tr,-.99,.99) 
 Populate a triangle of the matrix with the sampled correlations Mirror the 
 triangle to populate the other triangle forming a symmetric matrix, cormat 
 Sample n observations from a multivariate normal distribution with mean 
 vector=0 and varcov=cormat


 Problem:
 This approach violates the triangle inequality property of correlation 
 matrices.  So, the matrix I've constructed is certainly a valid matrix but it 
 is not a valid correlation matrix and it blows up when you submit it to a 
 random number generator such as rmnorm.  With a small matrix you sometimes 
 get lucky and generate a valid correlation matrix but as you increase d the 
 probability of obtaining a valid correlation matrix drops off quickly.

 So, any ideas on how to construct a correlation matrix with random entries 
 that cover the range (or most of the range) or the correlation [-1,1]?

 Here's the code I've used that won't work.
 
 library(mnormt)
 n - 1000
 d - 50

 n.tri - ((d*(d+1))/2)-d
 r       - runif(n.tri, min=-.5, max=.5)

 cormat - diag(c)
 count1=1
 for (i in 1:c){
       for (j in 1:c){
               if (ij) {
                               cormat[i,j]=r[count1]
                               cormat[j,i]=cormat[i,j]
                               count1=count1+1
                            }
       }
 }
 eigen(cormat)     # if negative eigenvalue, then the matrix violates the 
 triangle inequality

 x -  rmnorm(n, rep(0, c), cormat)  # Sample the data



 Thanks in advance,

 Rick DeShon

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 Notice:  This e-mail message, together with any attachme...{{dropped:11

Re: [R] Generate multivariate normal data with a random correlation matrix

2011-02-10 Thread rex.dwyer
If you want a random correlation matrix, why not just generate random data and 
accept the correlation matrix that you get?  The standard normal distribution 
in k dimensions is (hyper)spherically symmetric.  If you generate k standard 
normal N(0,1) variates, you have a point in k-space with direction uniformly 
distributed on the (k-1)sphere and Gaussian magnitude.  If you generate k such, 
you have a random linear transformation with all sorts of desirable symmetries. 
 So, if you generate a kxk matrix of standard normal variates, and another nxk 
standard normal variates, and multiply the two matrices to get n points in k 
space, that seems to be a pretty good definition of random correlation to me.  
I'm sure you can decompose the kxk matrix to get the theoretical distribution, 
maybe by multiplying it by its transpose and doing an SVD; I'd have to think 
about that part.
... unless you have a particular distribution of correlation matrices in mind 
to begin with, which doesn't seem to be the case.


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Szumiloski, John
Sent: Wednesday, February 09, 2011 11:30 AM
To: r-help@r-project.org
Cc: Rick DeShon
Subject: Re: [R] Generate multivariate normal data with a random correlation 
matrix

The knee jerk thought I had was to express the correlation matrix as a generic 
Choleski decomposition, then randomly populate the triangular decomposed 
matrix.  When you remultiply, you can simply rescale to 1s on the diagonals.  
Then rmnorm as usual.

In R, see ?chol

If you want to get fancy, you could look at the random distribution you would 
use for the triangular matrix and play with that, including different 
distributions for different elements, elements' distributions being conditional 
on values of previously randomized elements, etc.

John

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Rick DeShon
Sent: Wednesday, 09 February, 2011 11:06 AM
To: r-h...@stat.math.ethz.ch
Subject: [R] Generate multivariate normal data with a random correlation matrix

Hi All.

I'd like to generate a sample of n observations from a k dimensional 
multivariate normal distribution with a random correlation matrix.

My solution:
The lower (or upper) triangle of the correlation matrix has n.tri=(d/2)(d+1)-d 
entries.
Take a uniform sample of n.tri possible correlations (runi(n.tr,-.99,.99) 
Populate a triangle of the matrix with the sampled correlations Mirror the 
triangle to populate the other triangle forming a symmetric matrix, cormat 
Sample n observations from a multivariate normal distribution with mean 
vector=0 and varcov=cormat


Problem:
This approach violates the triangle inequality property of correlation 
matrices.  So, the matrix I've constructed is certainly a valid matrix but it 
is not a valid correlation matrix and it blows up when you submit it to a 
random number generator such as rmnorm.  With a small matrix you sometimes get 
lucky and generate a valid correlation matrix but as you increase d the 
probability of obtaining a valid correlation matrix drops off quickly.

So, any ideas on how to construct a correlation matrix with random entries that 
cover the range (or most of the range) or the correlation [-1,1]?

Here's the code I've used that won't work.

library(mnormt)
n - 1000
d - 50

n.tri - ((d*(d+1))/2)-d
r   - runif(n.tri, min=-.5, max=.5)

cormat - diag(c)
count1=1
for (i in 1:c){
   for (j in 1:c){
   if (ij) {
   cormat[i,j]=r[count1]
   cormat[j,i]=cormat[i,j]
   count1=count1+1
}
   }
}
eigen(cormat) # if negative eigenvalue, then the matrix violates the 
triangle inequality

x -  rmnorm(n, rep(0, c), cormat)  # Sample the data



Thanks in advance,

Rick DeShon

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Notice:  This e-mail message, together with any attachme...{{dropped:11}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




message may contain confidential information. If you are not the designated 
recipient, please notify the sender immediately, and delete the original and 
any copies. Any use of the message by you is prohibited. 
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting

Re: [R] Generate multivariate normal data with a random correlation matrix

2011-02-09 Thread Eik Vettorazzi
Hi Rick,
you can generate an arbitrary matrix X and calculate X'X, which is at
least positive semi-definite (and definite, when X has full rank)

X-matrix(rnorm(16),4)
 covmat-t(X)%*%X
#check
 !any(eigen(covmat)$value0)
#corresponding correlation matrix
cov2cor(covmat)

Random multivariate can be generated with 'mvrnorm' from the MASS-package.


hth.


Am 09.02.2011 17:19, schrieb Rick DeShon:
 Hi All.
 
 I'd like to generate a sample of n observations from a k dimensional
 multivariate normal distribution with a random correlation matrix.
 
 My solution:
 1) The lower (or upper) triangle of the correlation matrix has
 n.tri=(d/2)(d+1)-d entries.
 2) Take a uniform sample of n.tri possible correlations (runi(n.tr,-.99,.99)
 3) Populate a triangle of the matrix with the sampled correlations
 4) Mirror the triangle to populate the other triangle forming a
 symmetric matrix, cormat
 5) Sample n observations from a multivariate normal distribution with
 mean vector=0 and varcov=cormat
 
 
 Problem:
 This approach violates the triangle inequality property of correlation
 matrices.  So, the matrix I've constructed is certainly a valid matrix
 but it is not a valid correlation matrix and it blows up when you
 submit it to a random number generator such as rmnorm.  With a small
 matrix you sometimes get lucky and generate a valid correlation matrix
 but as you increase d the probability of obtaining a valid correlation
 matrix drops off quickly.
 
 So, any ideas on how to construct a correlation matrix with random
 entries that cover the range (or most of the range) or the correlation
 [-1,1]?
 
 Here's the code I've used that won't work.
 
 library(mnormt)
 n - 1000
 d - 50
 
 n.tri - ((d*(d+1))/2)-d
 r   - runif(n.tri, min=-.5, max=.5)
 
 cormat - diag(c)
 count1=1
 for (i in 1:c){
for (j in 1:c){
if (ij) {
cormat[i,j]=r[count1]
cormat[j,i]=cormat[i,j]
count1=count1+1
 }
}
 }
 eigen(cormat) # if negative eigenvalue, then the matrix violates
 the triangle inequality
 
 x -  rmnorm(n, rep(0, c), cormat)  # Sample the data
 
 
 
 Thanks in advance,
 
 Rick DeShon
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Eik Vettorazzi
Institut für Medizinische Biometrie und Epidemiologie
Universitätsklinikum Hamburg-Eppendorf

Martinistr. 52
20246 Hamburg

T ++49/40/7410-58243
F ++49/40/7410-57790

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Generate multivariate normal data with a random correlation matrix

2011-02-09 Thread Szumiloski, John
The knee jerk thought I had was to express the correlation matrix as a generic 
Choleski decomposition, then randomly populate the triangular decomposed 
matrix.  When you remultiply, you can simply rescale to 1s on the diagonals.  
Then rmnorm as usual.

In R, see ?chol

If you want to get fancy, you could look at the random distribution you would 
use for the triangular matrix and play with that, including different 
distributions for different elements, elements' distributions being conditional 
on values of previously randomized elements, etc.  

John

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Rick DeShon
Sent: Wednesday, 09 February, 2011 11:06 AM
To: r-h...@stat.math.ethz.ch
Subject: [R] Generate multivariate normal data with a random correlation matrix

Hi All.

I'd like to generate a sample of n observations from a k dimensional 
multivariate normal distribution with a random correlation matrix.

My solution:
The lower (or upper) triangle of the correlation matrix has n.tri=(d/2)(d+1)-d 
entries.
Take a uniform sample of n.tri possible correlations (runi(n.tr,-.99,.99) 
Populate a triangle of the matrix with the sampled correlations Mirror the 
triangle to populate the other triangle forming a symmetric matrix, cormat 
Sample n observations from a multivariate normal distribution with mean 
vector=0 and varcov=cormat


Problem:
This approach violates the triangle inequality property of correlation 
matrices.  So, the matrix I've constructed is certainly a valid matrix but it 
is not a valid correlation matrix and it blows up when you submit it to a 
random number generator such as rmnorm.  With a small matrix you sometimes get 
lucky and generate a valid correlation matrix but as you increase d the 
probability of obtaining a valid correlation matrix drops off quickly.

So, any ideas on how to construct a correlation matrix with random entries that 
cover the range (or most of the range) or the correlation [-1,1]?

Here's the code I've used that won't work.

library(mnormt)
n - 1000
d - 50

n.tri - ((d*(d+1))/2)-d
r       - runif(n.tri, min=-.5, max=.5)

cormat - diag(c)
count1=1
for (i in 1:c){
       for (j in 1:c){
               if (ij) {
                               cormat[i,j]=r[count1]
                               cormat[j,i]=cormat[i,j]
                               count1=count1+1
                            }
       }
}
eigen(cormat)     # if negative eigenvalue, then the matrix violates the 
triangle inequality

x -  rmnorm(n, rep(0, c), cormat)  # Sample the data



Thanks in advance,

Rick DeShon

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Notice:  This e-mail message, together with any attachme...{{dropped:11}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.