a possible vectorized solution is the following:

cor.mat <- cor(matrix(rnorm(100*1000), 1000, 100))
p <- 30 # how many top items

n <- ncol(cor.mat)
cmat <- col(cor.mat)
ind <- order(-cmat, cor.mat, decreasing = TRUE) - (n * cmat - n)
dim(ind) <- dim(cor.mat)
ind <- ind[seq(2, p + 1), ]
out <- cbind(ID = c(col(ind)), ID2 = c(ind))
as.data.frame(cbind(out, cor = cor.mat[out]))


I hope it helps.

Best,
Dimitris


Tan, Richard wrote:
Hi,
I have a correlation matrix of about 3000 items, i.e., a 3000*3000
matrix.  For each of the 3000 items, I want to get the top 50 items that
have the highest correlation with it (excluding itself) and generate a
data frame with 3 columns like ("ID", "ID2", "cor"), where ID is those
3000 items each repeat 50 times, and ID2 is the top 50 correlated items
with ID, and cor is the correlation of ID and ID2.  I know I can use two
for loops to do it but it is very time consuming considering the
correlation matrix is generated for each month of the past 20 years.  Is
there a better way to do it?
Regards, Richard
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to