Re: [R] Pairwise n for large correlation tables?

2006-08-11 Thread Adam D. I. Kramer

On Tue, 8 Aug 2006, [EMAIL PROTECTED] wrote:

 Try this:

 # mat is test matrix
 mat - matrix(1:25, 5)
 mat[2,2] - mat[3,4] - NA
 crossprod(!is.na(mat))

Exactly what I was looking for! Thanks.

--Adam



 On 8/7/06, Adam D. I. Kramer [EMAIL PROTECTED] wrote:
 Hello,

 I'm using a very large data set (n  100,000 for 7 columns), for which I'm
 pretty happy dealing with pairwise-deleted correlations to populate my
 correlation table. E.g.,

 a - cor(cbind(col1, col2, col3),use=pairwise.complete.obs)

 ...however, I am interested in the number of cases used to compute each
 cell of the correlation table. I am unable to find such a function via
 google searches, so I wrote one of my own. This turns out to be highly
 inefficient (e.g., it takes much, MUCH longer than the correlations do). Any
 hints, regarding other functions to use or ways to maket his speedier, would
 be much appreciated!

 pairwise.n - function(df=stop(Must provide data frame!)) {
   if (!is.data.frame(df)) {
 df - as.data.frame(df)
   }
   colNum - ncol(df)
   result - 
 matrix(data=NA,nrow=colNum,ncol=ncolNum,dimnames=list(colnames(df),colnames(df)))
   for(i in 1:colNum) {
 for (j in i:colNum) {
   result[i,j] - length(df[!is.na(df[i])!is.na(df[j])])/colNum
 }
   }
   result
 }

 --
 Adam D. I. Kramer
 University of Oregon

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Pairwise n for large correlation tables?

2006-08-07 Thread Adam D. I. Kramer
Hello,

I'm using a very large data set (n  100,000 for 7 columns), for which I'm
pretty happy dealing with pairwise-deleted correlations to populate my
correlation table. E.g.,

a - cor(cbind(col1, col2, col3),use=pairwise.complete.obs)

...however, I am interested in the number of cases used to compute each
cell of the correlation table. I am unable to find such a function via
google searches, so I wrote one of my own. This turns out to be highly
inefficient (e.g., it takes much, MUCH longer than the correlations do). Any
hints, regarding other functions to use or ways to maket his speedier, would
be much appreciated!

pairwise.n - function(df=stop(Must provide data frame!)) {
   if (!is.data.frame(df)) {
 df - as.data.frame(df)
   }
   colNum - ncol(df)
   result - 
matrix(data=NA,nrow=colNum,ncol=ncolNum,dimnames=list(colnames(df),colnames(df)))
   for(i in 1:colNum) {
 for (j in i:colNum) {
   result[i,j] - length(df[!is.na(df[i])!is.na(df[j])])/colNum
 }
   }
   result
}

--
Adam D. I. Kramer
University of Oregon

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Pairwise n for large correlation tables?

2006-08-07 Thread Gabor Grothendieck
Try this:

# mat is test matrix
mat - matrix(1:25, 5)
mat[2,2] - mat[3,4] - NA
crossprod(!is.na(mat))


On 8/7/06, Adam D. I. Kramer [EMAIL PROTECTED] wrote:
 Hello,

 I'm using a very large data set (n  100,000 for 7 columns), for which I'm
 pretty happy dealing with pairwise-deleted correlations to populate my
 correlation table. E.g.,

 a - cor(cbind(col1, col2, col3),use=pairwise.complete.obs)

 ...however, I am interested in the number of cases used to compute each
 cell of the correlation table. I am unable to find such a function via
 google searches, so I wrote one of my own. This turns out to be highly
 inefficient (e.g., it takes much, MUCH longer than the correlations do). Any
 hints, regarding other functions to use or ways to maket his speedier, would
 be much appreciated!

 pairwise.n - function(df=stop(Must provide data frame!)) {
   if (!is.data.frame(df)) {
 df - as.data.frame(df)
   }
   colNum - ncol(df)
   result - 
 matrix(data=NA,nrow=colNum,ncol=ncolNum,dimnames=list(colnames(df),colnames(df)))
   for(i in 1:colNum) {
 for (j in i:colNum) {
   result[i,j] - length(df[!is.na(df[i])!is.na(df[j])])/colNum
 }
   }
   result
 }

 --
 Adam D. I. Kramer
 University of Oregon

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Pairwise n for large correlation tables?

2006-08-07 Thread Christos Hatzis
Hi,

You can use complete.cases
It should run faster than the code you suggested.

See following example:

x - matrix(runif(30),10,3)

# introduce missing values
x[sample(1:10,3),1] - NA
x[sample(1:10,3),2] - NA
x[sample(1:10,3),3] - NA

cor(x,use=pairwise.complete.obs)

n - ncol(x)
n.na - matrix(0, n, n)
for (i in seq(1, n)) {
n.na[i,i] - sum( complete.cases(x[, i]) )
for (j in seq(i+1, length=n-i)) {
ok - sum( complete.cases(x[, i], x[, j]) )
n.na[i,j] - n.na[j,i] - ok
}
}
 
HTH

-Christos

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Adam D. I. Kramer
Sent: Monday, August 07, 2006 10:04 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Pairwise n for large correlation tables?

Hello,

I'm using a very large data set (n  100,000 for 7 columns), for which I'm
pretty happy dealing with pairwise-deleted correlations to populate my
correlation table. E.g.,

a - cor(cbind(col1, col2, col3),use=pairwise.complete.obs)

...however, I am interested in the number of cases used to compute each cell
of the correlation table. I am unable to find such a function via google
searches, so I wrote one of my own. This turns out to be highly inefficient
(e.g., it takes much, MUCH longer than the correlations do). Any hints,
regarding other functions to use or ways to maket his speedier, would be
much appreciated!

pairwise.n - function(df=stop(Must provide data frame!)) {
   if (!is.data.frame(df)) {
 df - as.data.frame(df)
   }
   colNum - ncol(df)
   result -
matrix(data=NA,nrow=colNum,ncol=ncolNum,dimnames=list(colnames(df),colnames(
df)))
   for(i in 1:colNum) {
 for (j in i:colNum) {
   result[i,j] - length(df[!is.na(df[i])!is.na(df[j])])/colNum
 }
   }
   result
}

--
Adam D. I. Kramer
University of Oregon

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.