Re: [R] Generate multivariate normal data with a random correlation matrix
Thanks for the response, Rex. This is an interesting approach. The Choleski decomposition approach that John suggested seems to be an obvious and direct approach to this problem. Your approach is less obvious to me but may be equal or superior to the Choleski decomposition. Are all possible correlation matrices of size k equally likely using your approach? It would seem so based on your description. If so, it is a way cool solution. Rick On Thu, Feb 10, 2011 at 12:18 PM, rex.dw...@syngenta.com wrote: If you want a random correlation matrix, why not just generate random data and accept the correlation matrix that you get? The standard normal distribution in k dimensions is (hyper)spherically symmetric. If you generate k standard normal N(0,1) variates, you have a point in k-space with direction uniformly distributed on the (k-1)sphere and Gaussian magnitude. If you generate k such, you have a random linear transformation with all sorts of desirable symmetries. So, if you generate a kxk matrix of standard normal variates, and another nxk standard normal variates, and multiply the two matrices to get n points in k space, that seems to be a pretty good definition of random correlation to me. I'm sure you can decompose the kxk matrix to get the theoretical distribution, maybe by multiplying it by its transpose and doing an SVD; I'd have to think about that part. ... unless you have a particular distribution of correlation matrices in mind to begin with, which doesn't seem to be the case. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Szumiloski, John Sent: Wednesday, February 09, 2011 11:30 AM To: r-help@r-project.org Cc: Rick DeShon Subject: Re: [R] Generate multivariate normal data with a random correlation matrix The knee jerk thought I had was to express the correlation matrix as a generic Choleski decomposition, then randomly populate the triangular decomposed matrix. When you remultiply, you can simply rescale to 1s on the diagonals. Then rmnorm as usual. In R, see ?chol If you want to get fancy, you could look at the random distribution you would use for the triangular matrix and play with that, including different distributions for different elements, elements' distributions being conditional on values of previously randomized elements, etc. John -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Rick DeShon Sent: Wednesday, 09 February, 2011 11:06 AM To: r-h...@stat.math.ethz.ch Subject: [R] Generate multivariate normal data with a random correlation matrix Hi All. I'd like to generate a sample of n observations from a k dimensional multivariate normal distribution with a random correlation matrix. My solution: The lower (or upper) triangle of the correlation matrix has n.tri=(d/2)(d+1)-d entries. Take a uniform sample of n.tri possible correlations (runi(n.tr,-.99,.99) Populate a triangle of the matrix with the sampled correlations Mirror the triangle to populate the other triangle forming a symmetric matrix, cormat Sample n observations from a multivariate normal distribution with mean vector=0 and varcov=cormat Problem: This approach violates the triangle inequality property of correlation matrices. So, the matrix I've constructed is certainly a valid matrix but it is not a valid correlation matrix and it blows up when you submit it to a random number generator such as rmnorm. With a small matrix you sometimes get lucky and generate a valid correlation matrix but as you increase d the probability of obtaining a valid correlation matrix drops off quickly. So, any ideas on how to construct a correlation matrix with random entries that cover the range (or most of the range) or the correlation [-1,1]? Here's the code I've used that won't work. library(mnormt) n - 1000 d - 50 n.tri - ((d*(d+1))/2)-d r - runif(n.tri, min=-.5, max=.5) cormat - diag(c) count1=1 for (i in 1:c){ for (j in 1:c){ if (ij) { cormat[i,j]=r[count1] cormat[j,i]=cormat[i,j] count1=count1+1 } } } eigen(cormat) # if negative eigenvalue, then the matrix violates the triangle inequality x - rmnorm(n, rep(0, c), cormat) # Sample the data Thanks in advance, Rick DeShon __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attachme...{{dropped:11
Re: [R] Generate multivariate normal data with a random correlation matrix
If you want a random correlation matrix, why not just generate random data and accept the correlation matrix that you get? The standard normal distribution in k dimensions is (hyper)spherically symmetric. If you generate k standard normal N(0,1) variates, you have a point in k-space with direction uniformly distributed on the (k-1)sphere and Gaussian magnitude. If you generate k such, you have a random linear transformation with all sorts of desirable symmetries. So, if you generate a kxk matrix of standard normal variates, and another nxk standard normal variates, and multiply the two matrices to get n points in k space, that seems to be a pretty good definition of random correlation to me. I'm sure you can decompose the kxk matrix to get the theoretical distribution, maybe by multiplying it by its transpose and doing an SVD; I'd have to think about that part. ... unless you have a particular distribution of correlation matrices in mind to begin with, which doesn't seem to be the case. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Szumiloski, John Sent: Wednesday, February 09, 2011 11:30 AM To: r-help@r-project.org Cc: Rick DeShon Subject: Re: [R] Generate multivariate normal data with a random correlation matrix The knee jerk thought I had was to express the correlation matrix as a generic Choleski decomposition, then randomly populate the triangular decomposed matrix. When you remultiply, you can simply rescale to 1s on the diagonals. Then rmnorm as usual. In R, see ?chol If you want to get fancy, you could look at the random distribution you would use for the triangular matrix and play with that, including different distributions for different elements, elements' distributions being conditional on values of previously randomized elements, etc. John -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Rick DeShon Sent: Wednesday, 09 February, 2011 11:06 AM To: r-h...@stat.math.ethz.ch Subject: [R] Generate multivariate normal data with a random correlation matrix Hi All. I'd like to generate a sample of n observations from a k dimensional multivariate normal distribution with a random correlation matrix. My solution: The lower (or upper) triangle of the correlation matrix has n.tri=(d/2)(d+1)-d entries. Take a uniform sample of n.tri possible correlations (runi(n.tr,-.99,.99) Populate a triangle of the matrix with the sampled correlations Mirror the triangle to populate the other triangle forming a symmetric matrix, cormat Sample n observations from a multivariate normal distribution with mean vector=0 and varcov=cormat Problem: This approach violates the triangle inequality property of correlation matrices. So, the matrix I've constructed is certainly a valid matrix but it is not a valid correlation matrix and it blows up when you submit it to a random number generator such as rmnorm. With a small matrix you sometimes get lucky and generate a valid correlation matrix but as you increase d the probability of obtaining a valid correlation matrix drops off quickly. So, any ideas on how to construct a correlation matrix with random entries that cover the range (or most of the range) or the correlation [-1,1]? Here's the code I've used that won't work. library(mnormt) n - 1000 d - 50 n.tri - ((d*(d+1))/2)-d r - runif(n.tri, min=-.5, max=.5) cormat - diag(c) count1=1 for (i in 1:c){ for (j in 1:c){ if (ij) { cormat[i,j]=r[count1] cormat[j,i]=cormat[i,j] count1=count1+1 } } } eigen(cormat) # if negative eigenvalue, then the matrix violates the triangle inequality x - rmnorm(n, rep(0, c), cormat) # Sample the data Thanks in advance, Rick DeShon __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attachme...{{dropped:11}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. message may contain confidential information. If you are not the designated recipient, please notify the sender immediately, and delete the original and any copies. Any use of the message by you is prohibited. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting
Re: [R] Generate multivariate normal data with a random correlation matrix
Hi Rick, you can generate an arbitrary matrix X and calculate X'X, which is at least positive semi-definite (and definite, when X has full rank) X-matrix(rnorm(16),4) covmat-t(X)%*%X #check !any(eigen(covmat)$value0) #corresponding correlation matrix cov2cor(covmat) Random multivariate can be generated with 'mvrnorm' from the MASS-package. hth. Am 09.02.2011 17:19, schrieb Rick DeShon: Hi All. I'd like to generate a sample of n observations from a k dimensional multivariate normal distribution with a random correlation matrix. My solution: 1) The lower (or upper) triangle of the correlation matrix has n.tri=(d/2)(d+1)-d entries. 2) Take a uniform sample of n.tri possible correlations (runi(n.tr,-.99,.99) 3) Populate a triangle of the matrix with the sampled correlations 4) Mirror the triangle to populate the other triangle forming a symmetric matrix, cormat 5) Sample n observations from a multivariate normal distribution with mean vector=0 and varcov=cormat Problem: This approach violates the triangle inequality property of correlation matrices. So, the matrix I've constructed is certainly a valid matrix but it is not a valid correlation matrix and it blows up when you submit it to a random number generator such as rmnorm. With a small matrix you sometimes get lucky and generate a valid correlation matrix but as you increase d the probability of obtaining a valid correlation matrix drops off quickly. So, any ideas on how to construct a correlation matrix with random entries that cover the range (or most of the range) or the correlation [-1,1]? Here's the code I've used that won't work. library(mnormt) n - 1000 d - 50 n.tri - ((d*(d+1))/2)-d r - runif(n.tri, min=-.5, max=.5) cormat - diag(c) count1=1 for (i in 1:c){ for (j in 1:c){ if (ij) { cormat[i,j]=r[count1] cormat[j,i]=cormat[i,j] count1=count1+1 } } } eigen(cormat) # if negative eigenvalue, then the matrix violates the triangle inequality x - rmnorm(n, rep(0, c), cormat) # Sample the data Thanks in advance, Rick DeShon __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Eik Vettorazzi Institut für Medizinische Biometrie und Epidemiologie Universitätsklinikum Hamburg-Eppendorf Martinistr. 52 20246 Hamburg T ++49/40/7410-58243 F ++49/40/7410-57790 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Generate multivariate normal data with a random correlation matrix
The knee jerk thought I had was to express the correlation matrix as a generic Choleski decomposition, then randomly populate the triangular decomposed matrix. When you remultiply, you can simply rescale to 1s on the diagonals. Then rmnorm as usual. In R, see ?chol If you want to get fancy, you could look at the random distribution you would use for the triangular matrix and play with that, including different distributions for different elements, elements' distributions being conditional on values of previously randomized elements, etc. John -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Rick DeShon Sent: Wednesday, 09 February, 2011 11:06 AM To: r-h...@stat.math.ethz.ch Subject: [R] Generate multivariate normal data with a random correlation matrix Hi All. I'd like to generate a sample of n observations from a k dimensional multivariate normal distribution with a random correlation matrix. My solution: The lower (or upper) triangle of the correlation matrix has n.tri=(d/2)(d+1)-d entries. Take a uniform sample of n.tri possible correlations (runi(n.tr,-.99,.99) Populate a triangle of the matrix with the sampled correlations Mirror the triangle to populate the other triangle forming a symmetric matrix, cormat Sample n observations from a multivariate normal distribution with mean vector=0 and varcov=cormat Problem: This approach violates the triangle inequality property of correlation matrices. So, the matrix I've constructed is certainly a valid matrix but it is not a valid correlation matrix and it blows up when you submit it to a random number generator such as rmnorm. With a small matrix you sometimes get lucky and generate a valid correlation matrix but as you increase d the probability of obtaining a valid correlation matrix drops off quickly. So, any ideas on how to construct a correlation matrix with random entries that cover the range (or most of the range) or the correlation [-1,1]? Here's the code I've used that won't work. library(mnormt) n - 1000 d - 50 n.tri - ((d*(d+1))/2)-d r - runif(n.tri, min=-.5, max=.5) cormat - diag(c) count1=1 for (i in 1:c){ for (j in 1:c){ if (ij) { cormat[i,j]=r[count1] cormat[j,i]=cormat[i,j] count1=count1+1 } } } eigen(cormat) # if negative eigenvalue, then the matrix violates the triangle inequality x - rmnorm(n, rep(0, c), cormat) # Sample the data Thanks in advance, Rick DeShon __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attachme...{{dropped:11}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.