Re: [R] Pairwise n for large correlation tables?
On Tue, 8 Aug 2006, [EMAIL PROTECTED] wrote: Try this: # mat is test matrix mat - matrix(1:25, 5) mat[2,2] - mat[3,4] - NA crossprod(!is.na(mat)) Exactly what I was looking for! Thanks. --Adam On 8/7/06, Adam D. I. Kramer [EMAIL PROTECTED] wrote: Hello, I'm using a very large data set (n 100,000 for 7 columns), for which I'm pretty happy dealing with pairwise-deleted correlations to populate my correlation table. E.g., a - cor(cbind(col1, col2, col3),use=pairwise.complete.obs) ...however, I am interested in the number of cases used to compute each cell of the correlation table. I am unable to find such a function via google searches, so I wrote one of my own. This turns out to be highly inefficient (e.g., it takes much, MUCH longer than the correlations do). Any hints, regarding other functions to use or ways to maket his speedier, would be much appreciated! pairwise.n - function(df=stop(Must provide data frame!)) { if (!is.data.frame(df)) { df - as.data.frame(df) } colNum - ncol(df) result - matrix(data=NA,nrow=colNum,ncol=ncolNum,dimnames=list(colnames(df),colnames(df))) for(i in 1:colNum) { for (j in i:colNum) { result[i,j] - length(df[!is.na(df[i])!is.na(df[j])])/colNum } } result } -- Adam D. I. Kramer University of Oregon __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Pairwise n for large correlation tables?
Hello, I'm using a very large data set (n 100,000 for 7 columns), for which I'm pretty happy dealing with pairwise-deleted correlations to populate my correlation table. E.g., a - cor(cbind(col1, col2, col3),use=pairwise.complete.obs) ...however, I am interested in the number of cases used to compute each cell of the correlation table. I am unable to find such a function via google searches, so I wrote one of my own. This turns out to be highly inefficient (e.g., it takes much, MUCH longer than the correlations do). Any hints, regarding other functions to use or ways to maket his speedier, would be much appreciated! pairwise.n - function(df=stop(Must provide data frame!)) { if (!is.data.frame(df)) { df - as.data.frame(df) } colNum - ncol(df) result - matrix(data=NA,nrow=colNum,ncol=ncolNum,dimnames=list(colnames(df),colnames(df))) for(i in 1:colNum) { for (j in i:colNum) { result[i,j] - length(df[!is.na(df[i])!is.na(df[j])])/colNum } } result } -- Adam D. I. Kramer University of Oregon __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pairwise n for large correlation tables?
Try this: # mat is test matrix mat - matrix(1:25, 5) mat[2,2] - mat[3,4] - NA crossprod(!is.na(mat)) On 8/7/06, Adam D. I. Kramer [EMAIL PROTECTED] wrote: Hello, I'm using a very large data set (n 100,000 for 7 columns), for which I'm pretty happy dealing with pairwise-deleted correlations to populate my correlation table. E.g., a - cor(cbind(col1, col2, col3),use=pairwise.complete.obs) ...however, I am interested in the number of cases used to compute each cell of the correlation table. I am unable to find such a function via google searches, so I wrote one of my own. This turns out to be highly inefficient (e.g., it takes much, MUCH longer than the correlations do). Any hints, regarding other functions to use or ways to maket his speedier, would be much appreciated! pairwise.n - function(df=stop(Must provide data frame!)) { if (!is.data.frame(df)) { df - as.data.frame(df) } colNum - ncol(df) result - matrix(data=NA,nrow=colNum,ncol=ncolNum,dimnames=list(colnames(df),colnames(df))) for(i in 1:colNum) { for (j in i:colNum) { result[i,j] - length(df[!is.na(df[i])!is.na(df[j])])/colNum } } result } -- Adam D. I. Kramer University of Oregon __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pairwise n for large correlation tables?
Hi, You can use complete.cases It should run faster than the code you suggested. See following example: x - matrix(runif(30),10,3) # introduce missing values x[sample(1:10,3),1] - NA x[sample(1:10,3),2] - NA x[sample(1:10,3),3] - NA cor(x,use=pairwise.complete.obs) n - ncol(x) n.na - matrix(0, n, n) for (i in seq(1, n)) { n.na[i,i] - sum( complete.cases(x[, i]) ) for (j in seq(i+1, length=n-i)) { ok - sum( complete.cases(x[, i], x[, j]) ) n.na[i,j] - n.na[j,i] - ok } } HTH -Christos -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Adam D. I. Kramer Sent: Monday, August 07, 2006 10:04 PM To: r-help@stat.math.ethz.ch Subject: [R] Pairwise n for large correlation tables? Hello, I'm using a very large data set (n 100,000 for 7 columns), for which I'm pretty happy dealing with pairwise-deleted correlations to populate my correlation table. E.g., a - cor(cbind(col1, col2, col3),use=pairwise.complete.obs) ...however, I am interested in the number of cases used to compute each cell of the correlation table. I am unable to find such a function via google searches, so I wrote one of my own. This turns out to be highly inefficient (e.g., it takes much, MUCH longer than the correlations do). Any hints, regarding other functions to use or ways to maket his speedier, would be much appreciated! pairwise.n - function(df=stop(Must provide data frame!)) { if (!is.data.frame(df)) { df - as.data.frame(df) } colNum - ncol(df) result - matrix(data=NA,nrow=colNum,ncol=ncolNum,dimnames=list(colnames(df),colnames( df))) for(i in 1:colNum) { for (j in i:colNum) { result[i,j] - length(df[!is.na(df[i])!is.na(df[j])])/colNum } } result } -- Adam D. I. Kramer University of Oregon __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.