Re: [R] get top 50 correlated item from a correlation matrix for each item

2009-02-12 Thread Dimitris Rizopoulos

a possible vectorized solution is the following:

cor.mat - cor(matrix(rnorm(100*1000), 1000, 100))
p - 30 # how many top items

n - ncol(cor.mat)
cmat - col(cor.mat)
ind - order(-cmat, cor.mat, decreasing = TRUE) - (n * cmat - n)
dim(ind) - dim(cor.mat)
ind - ind[seq(2, p + 1), ]
out - cbind(ID = c(col(ind)), ID2 = c(ind))
as.data.frame(cbind(out, cor = cor.mat[out]))


I hope it helps.

Best,
Dimitris


Tan, Richard wrote:

Hi,
 
I have a correlation matrix of about 3000 items, i.e., a 3000*3000

matrix.  For each of the 3000 items, I want to get the top 50 items that
have the highest correlation with it (excluding itself) and generate a
data frame with 3 columns like (ID, ID2, cor), where ID is those
3000 items each repeat 50 times, and ID2 is the top 50 correlated items
with ID, and cor is the correlation of ID and ID2.  I know I can use two
for loops to do it but it is very time consuming considering the
correlation matrix is generated for each month of the past 20 years.  Is
there a better way to do it?
 
Regards,
 
Richard 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] get top 50 correlated item from a correlation matrix for each item

2009-02-12 Thread Tan, Richard
Works like a charm, thank you! 

-Original Message-
From: Dimitris Rizopoulos [mailto:d.rizopou...@erasmusmc.nl] 
Sent: Thursday, February 12, 2009 12:11 PM
To: Tan, Richard
Cc: r-help@r-project.org
Subject: Re: [R] get top 50 correlated item from a correlation matrix
for each item

a possible vectorized solution is the following:

cor.mat - cor(matrix(rnorm(100*1000), 1000, 100)) p - 30 # how many
top items

n - ncol(cor.mat)
cmat - col(cor.mat)
ind - order(-cmat, cor.mat, decreasing = TRUE) - (n * cmat - n)
dim(ind) - dim(cor.mat)
ind - ind[seq(2, p + 1), ]
out - cbind(ID = c(col(ind)), ID2 = c(ind)) as.data.frame(cbind(out,
cor = cor.mat[out]))


I hope it helps.

Best,
Dimitris


Tan, Richard wrote:
 Hi,
  
 I have a correlation matrix of about 3000 items, i.e., a 3000*3000 
 matrix.  For each of the 3000 items, I want to get the top 50 items 
 that have the highest correlation with it (excluding itself) and 
 generate a data frame with 3 columns like (ID, ID2, cor), where 
 ID is those 3000 items each repeat 50 times, and ID2 is the top 50 
 correlated items with ID, and cor is the correlation of ID and ID2.  I

 know I can use two for loops to do it but it is very time consuming 
 considering the correlation matrix is generated for each month of the 
 past 20 years.  Is there a better way to do it?
  
 Regards,
  
 Richard
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] get top 50 correlated item from a correlation matrix for each item

2009-02-12 Thread JLucke
A solution using a toy example

r - cor(mvrnorm(1000,mu=rep(0,10),Sigma=diag(10)))  #assume a 10 x 10 
matrix

j - i-1:dim(r)[1] #generate matrix indices
lt - outer(i,j,'') #get boolean lower triangle
sort(r[lt],decreasing=TRUE)[1:5] #extract top 5 correlations



Joseph F. Lucke
Senior Statistician
Research Institute on Addictions
University at Buffalo
SUNY




Tan, Richard r...@panagora.com 
Sent by: r-help-boun...@r-project.org
02/12/2009 11:19 AM

To
r-help@r-project.org
cc

Subject
[R] get top 50 correlated item from a correlation matrix for each item






Hi,
 
I have a correlation matrix of about 3000 items, i.e., a 3000*3000
matrix.  For each of the 3000 items, I want to get the top 50 items that
have the highest correlation with it (excluding itself) and generate a
data frame with 3 columns like (ID, ID2, cor), where ID is those
3000 items each repeat 50 times, and ID2 is the top 50 correlated items
with ID, and cor is the correlation of ID and ID2.  I know I can use two
for loops to do it but it is very time consuming considering the
correlation matrix is generated for each month of the past 20 years.  Is
there a better way to do it?
 
Regards,
 
Richard 

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.