I believe this will do: cor.pval2 <- function(x, alternative="two-sided") { corMat <- cor(x, use=if (any(is.na(x))) "pairwise.complete.obs" else "all") df <- crossprod(!is.na(x)) - 2 STATISTIC <- sqrt(df) * corMat / sqrt(1 - corMat^2) p <- pt(STATISTIC, df) p <- if (alternative == "less") { p } else if (alternative == "greater") { 1 - p } else 2 * pmin(p, 1 - p) p }
Some test: > set.seed(1) > x <- matrix(runif(2e3 * 1e2), 2e3) > system.time(res1 <- cor.pval(t(x)), gcFirst=TRUE) [1] 17.28 0.77 18.16 NA NA > system.time(res2 <- cor.pval2(t(x)), gcFirst=TRUE) [1] 19.51 1.05 20.70 NA NA > max(abs(res1 - res2)) [1] 0 > x[c(1, 3, 6), c(2, 4)] <- NA > x[30, 61] <- NA > system.time(res2 <- cor.pval2(t(x)), gcFirst=TRUE) [1] 24.48 0.71 25.28 NA NA This is a bit slower because of the extra computation for "df". One can try to save some computation by only computing with the lower (or upper) triangular part. Cheers, Andy > From: John Fox > > Dear Dren, > > Since cor(), on which Andy's solution is based, can compute > pairwise-present > correlations, you could adapt his function -- you'll have to > adjust the df > for each pair. Alternatively, you could probably save some > time (programming > time + computer time) by just using my solution: > > > R <- diag(100) > > R[upper.tri(R)] <- R[lower.tri(R)] <- .5 > > library(mvtnorm) > > X <- rmvnorm(6000, sigma=R) > > system.time(for (i in 1:50) cor.pvalues(X), gc=TRUE) > [1] 518.19 1.11 520.23 NA NA > > I know that time is money, but nine minutes (on my machine) > probably won't > bankrupt anyone. > > Regards, > John > > -------------------------------- > John Fox > Department of Sociology > McMaster University > Hamilton, Ontario > Canada L8S 4M4 > 905-525-9140x23604 > http://socserv.mcmaster.ca/jfox > -------------------------------- > > > -----Original Message----- > > From: [EMAIL PROTECTED] > > [mailto:[EMAIL PROTECTED] On Behalf Of Dren Scott > > Sent: Monday, April 18, 2005 12:41 PM > > To: 'R-Help' > > Subject: RE: [R] Pearson corelation and p-value for matrix > > > > Hi all, > > > > Thanks Andy, Mark and John for all the help. I really > > appreciate it. I'm new to both R and statistics, so please > > excuse any gaffes on my part. > > > > Essentially what I'm trying to do, is to evaluate for each > > row, how many other rows would have a p-value < 0.05. So, > > after I get my N x N p-value matrix, I'll just filter out > > values that are > 0.05. > > > > Each of my datasets (6000 rows x 100 columns) would consist > > of some NA's. The iterative procedure (cor.pvalues) proposed > > by John would yield the values, but it would take an > > inordinately long time (I have 50 of these datasets to > > process). The solution proposed by Andy, although fast, would > > not be able to incorporate the NA's. > > > > Is there any workaround for the NA's? Or possibly do you > > think I could try something else? > > > > Thanks very much. > > > > Dren > > > > > > "Liaw, Andy" <[EMAIL PROTECTED]> wrote: > > > From: John Fox > > > > > > Dear Andy, > > > > > > That's clearly much better -- and illustrates an > effective strategy > > > for vectorizing (or "matricizing") a computation. I think > I'll add > > > this to my list of programming examples. It might be a little > > > dangerous to pass ... > > > through to cor(), since someone could specify > type="spearman", for > > > example. > > > > Ah, yes, the "..." isn't likely to help here! Also, it will > > only work correctly if there are no NA's, for example (or > > else the degree of freedom would be wrong). > > > > Best, > > Andy > > > > > Thanks, > > > John > > > > > > -------------------------------- > > > John Fox > > > Department of Sociology > > > McMaster University > > > Hamilton, Ontario > > > Canada L8S 4M4 > > > 905-525-9140x23604 > > > http://socserv.mcmaster.ca/jfox > > > -------------------------------- > > > > > > > -----Original Message----- > > > > From: Liaw, Andy [mailto:[EMAIL PROTECTED] > > > > Sent: Friday, April 15, 2005 9:51 PM > > > > To: 'John Fox'; [EMAIL PROTECTED] > > > > Cc: 'R-Help'; 'Dren Scott' > > > > Subject: RE: [R] Pearson corelation and p-value for matrix > > > > > > > > We can be a bit sneaky and `borrow' code from cor.test.default: > > > > > > > > cor.pval <- function(x, alternative="two-sided", ...) { > corMat <- > > > > cor(x, ...) n <- nrow(x) df <- n - 2 STATISTIC <- > > sqrt(df) * corMat > > > > / sqrt(1 - corMat^2) p <- pt(STATISTIC, df) p <- if > > (alternative == > > > > "less") { p } else if (alternative == "greater") { > > > > 1 - p > > > > } else 2 * pmin(p, 1 - p) > > > > p > > > > } > > > > > > > > The test: > > > > > > > > > system.time(c1 <- cor.pvals(X), gcFirst=TRUE) > > > > [1] 13.19 0.01 13.58 NA NA > > > > > system.time(c2 <- cor.pvalues(X), gcFirst=TRUE) > > > > [1] 6.22 0.00 6.42 NA NA > > > > > system.time(c3 <- cor.pval(X), gcFirst=TRUE) > > > > [1] 0.07 0.00 0.07 NA NA > > > > > > > > Cheers, > > > > Andy > > > > > > > > > From: John Fox > > > > > > > > > > Dear Mark, > > > > > > > > > > I think that the reflex of trying to avoid loops in R > is often > > > > > mistaken, and so I decided to try to time the two > > > approaches (on a > > > > > 3GHz Windows XP system). > > > > > > > > > > I discovered, first, that there is a bug in your > > function -- you > > > > > appear to have indexed rows instead of columns; fixing that: > > > > > > > > > > cor.pvals <- function(mat) > > > > > { > > > > > cols <- expand.grid(1:ncol(mat), 1:ncol(mat)) > > matrix(apply(cols, > > > > > 1, > > > > > function(x) cor.test(mat[, x[1]], mat[, > x[2]])$p.value), ncol = > > > > > ncol(mat)) } > > > > > > > > > > > > > > > My function is cor.pvalues and yours cor.pvals. This is > > > for a data > > > > > matrix with 1000 observations on 100 variables: > > > > > > > > > > > R <- diag(100) > > > > > > R[upper.tri(R)] <- R[lower.tri(R)] <- .5 > > > > > > library(mvtnorm) > > > > > > X <- rmvnorm(1000, sigma=R) > > > > > > dim(X) > > > > > [1] 1000 100 > > > > > > > > > > > > system.time(cor.pvalues(X)) > > > > > [1] 5.53 0.00 5.53 NA NA > > > > > > > > > > > > system.time(cor.pvals(X)) > > > > > [1] 12.66 0.00 12.66 NA NA > > > > > > > > > > > > > > > > I frankly didn't expect the advantage of my approach to be > > > > this large, > > > > > but there it is. > > > > > > > > > > Regards, > > > > > John > > > > > > > > > > -------------------------------- > > > > > John Fox > > > > > Department of Sociology > > > > > McMaster University > > > > > Hamilton, Ontario > > > > > Canada L8S 4M4 > > > > > 905-525-9140x23604 > > > > > http://socserv.mcmaster.ca/jfox > > > > > -------------------------------- > > > > > > > > > > > -----Original Message----- > > > > > > From: Marc Schwartz [mailto:[EMAIL PROTECTED] > > > > > > Sent: Friday, April 15, 2005 7:08 PM > > > > > > To: John Fox > > > > > > Cc: 'Dren Scott'; R-Help > > > > > > Subject: RE: [R] Pearson corelation and p-value for matrix > > > > > > > > > > > > Here is what might be a slightly more efficient way to > > > > get to John's > > > > > > question: > > > > > > > > > > > > cor.pvals <- function(mat) > > > > > > { > > > > > > rows <- expand.grid(1:nrow(mat), 1:nrow(mat)) > > matrix(apply(rows, > > > > > > 1, > > > > > > function(x) cor.test(mat[x[1], ], mat[x[2], > > ])$p.value), ncol = > > > > > > nrow(mat)) } > > > > > > > > > > > > HTH, > > > > > > > > > > > > Marc Schwartz > > > > > > > > > > > > On Fri, 2005-04-15 at 18:26 -0400, John Fox wrote: > > > > > > > Dear Dren, > > > > > > > > > > > > > > How about the following? > > > > > > > > > > > > > > cor.pvalues <- function(X){ > > > > > > > nc <- ncol(X) > > > > > > > res <- matrix(0, nc, nc) > > > > > > > for (i in 2:nc){ > > > > > > > for (j in 1:(i - 1)){ > > > > > > > res[i, j] <- res[j, i] <- cor.test(X[,i], > > > > > X[,j])$p.value > > > > > > > } > > > > > > > } > > > > > > > res > > > > > > > } > > > > > > > > > > > > > > What one then does with all of those non-independent test > > > > > > is another > > > > > > > question, I guess. > > > > > > > > > > > > > > I hope this helps, > > > > > > > John > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > From: [EMAIL PROTECTED] > > > > > > > > [mailto:[EMAIL PROTECTED] On Behalf Of > > > > > Dren Scott > > > > > > > > Sent: Friday, April 15, 2005 4:33 PM > > > > > > > > To: r-help@stat.math.ethz.ch > > > > > > > > Subject: [R] Pearson corelation and p-value for matrix > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > I was trying to evaluate the pearson correlation and > > > > > the p-values > > > > > > > > for an nxm matrix, where each row represents a vector. > > > > > > One way to do > > > > > > > > it would be to iterate through each row, and find its > > > > > correlation > > > > > > > > value( and the p-value) with respect to the other rows. > > > > > Is there > > > > > > > > some function by which I can use the matrix as input? > > > > > > Ideally, the > > > > > > > > output would be an nxn matrix, containing the p-values > > > > > > between the > > > > > > > > respective vectors. > > > > > > > > > > > > > > > > I have tried cor.test for the iterations, but > > > couldn't find a > > > > > > > > function that would take the matrix as input. > > > > > > > > > > > > > > > > Thanks for the help. > > > > > > > > > > > > > > > > Dren > > > > > > > > > > > > > > > > > > > > > > ______________________________________________ > > > > > R-help@stat.math.ethz.ch mailing list > > > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > > > PLEASE do read the posting guide! > > > > > http://www.R-project.org/posting-guide.html > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------------------------------------------------------- > > > > ---------------- > > > > Notice: This e-mail message, together with any > > attachments, contains > > > > information of Merck & Co., Inc. (One Merck Drive, Whitehouse > > > > Station, New Jersey, USA 08889), and/or its affiliates > > (which may be > > > > known outside the United States as Merck Frosst, Merck > > Sharp & Dohme > > > > or MSD and in Japan, as > > > > Banyu) that may be confidential, proprietary copyrighted and/or > > > > legally privileged. It is intended solely for the use of the > > > > individual or entity named on this message. If you are not the > > > > intended recipient, and have received this message in > > error, please > > > > notify us immediately by reply e-mail and then delete it > > from your > > > > system. > > > > -------------------------------------------------------------- > > > > ---------------- > > > > > > > > > > > > > > > > > -------------------------------------------------------------- > > ---------------- > > > > -------------------------------------------------------------- > > ---------------- > > > > > > --------------------------------- > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guide.html > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > > ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html