And just for fun, here is a more "functional" version without the explicit 'for()' loops that gives the lower triangular values of the counts of nonmissing values for each correlation:
## (using John's notation) w <- expand.grid(1:nc,1:nc) f <- \(x) ifelse(x[1] <= x[2], NA, nrow(na.omit(dat[,x]))) ans <- matrix(apply(w,1,f), ncol = nc) Note: This is just a matter of taste. I would be amazed if it's any faster than using the explicit loops (they are implicit here, but still there -- and it might be slower!). But maybe it reinforces John's comment about the value of spending some(more) effort to learn how to program in R. Gives you flexibility to program however you like, which usually reduces bugs in one's code, I believe. Cheers, Bert Note: this is just On Fri, Aug 26, 2022 at 9:04 AM John Fox <j...@mcmaster.ca> wrote: > > Dear Val, > > On 2022-08-26 10:41 a.m., Val wrote: > > Hi John and Timothy > > > > Thank you for your suggestion and help. Using the sample data, I did > > carry out a test run and found a difference in the correlation result. > > > > Option 1. > > data_cor <- cor(dat[ , colnames(dat) != "x1"], # Calculate correlations > > dat$x1, method = "pearson", use = "complete.obs") > > resulted > > [,1] > > x2 -0.5845835 > > x3 -0.4664220 > > x4 0.7202837 > > > > Option 2. > > for(i in colnames(dat)){ > > print(cor.test(dat[,i], dat$x1, method = "pearson", use = > > "complete.obs")$estimate) > > } > > [,1] > > x2 -0.7362030 > > x3 -0.04935132 > > x4 0.85766290 > > > > This was crosschecked using Excel and other softwares and all matches > > with option 2. > > One of the factors that contributed for this difference is loss of > > information when we are using na.rm(). This is because that if x2 has > > missing value but x3 and x4 don’t have then na.rm() removed entire > > row information including x3 and x4. > > Yes, I already explained that in my previous message. > > As well, cor() is capable of computing pairwise-complete correlations -- > see ?cor. > > There's not an obvious right answer here, however. Using > pairwise-complete correlations can produce inconsistent (i.e., > non-positive semi-definite) correlation matrices because correlations > are computed on different subsets of the data. > > There are much better ways to deal with missing data. > > > > > My question is there a way to extract the number of rows (N) used in > > the correlation analysis?. > > I'm sure that there are many ways, but here is one that is very > simple-minded and should be reasonably efficient for ~250 variables: > > > (nc <- ncol(dat)) > [1] 4 > > > R <- N <- matrix(NA, nc, nc) > > diag(R) <- 1 > > for (i in 1:(nc - 1)){ > + for (j in (i + 1):nc){ > + R[i, j] <- R[j, i] <-cor(dat[, i], dat[, j], use="complete.obs") > + N[i, j] <- N[j, i] <- nrow(na.omit(dat[, c(i, j)])) > + } > + } > > > round(R, 3) > [,1] [,2] [,3] [,4] > [1,] 1.000 -0.736 -0.049 0.858 > [2,] -0.736 1.000 0.458 -0.428 > [3,] -0.049 0.458 1.000 0.092 > [4,] 0.858 -0.428 0.092 1.000 > > > N > [,1] [,2] [,3] [,4] > [1,] NA 8 8 8 > [2,] 8 NA 8 8 > [3,] 8 8 NA 8 > [4,] 8 8 8 NA > > > round(cor(dat, use="pairwise.complete.obs"), 3) # check > x1 x2 x3 x4 > x1 1.000 -0.736 -0.049 0.858 > x2 -0.736 1.000 0.458 -0.428 > x3 -0.049 0.458 1.000 0.092 > x4 0.858 -0.428 0.092 1.000 > > More generally, I think that it's a good idea to learn a little bit > about R programming if you intend to use R in your work. You'll then be > able to solve problems like this yourself. > > I hope this helps, > John > > > Thank you, > > > > On Mon, Aug 22, 2022 at 1:00 PM John Fox <j...@mcmaster.ca> wrote: > >> > >> Dear Val, > >> > >> On 2022-08-22 1:33 p.m., Val wrote: > >>> For the time being I am assuming the relationship across variables > >>> is linear. I want get the values first and detailed examining of > >>> the relationship will follow later. > >> > >> This seems backwards to me, but I'll refrain from commenting further on > >> whether what you want to do makes sense and instead address how to do it > >> (not, BTW, because I disagree with Bert's and Tim's remarks). > >> > >> Please see below: > >> > >>> > >>> On Mon, Aug 22, 2022 at 12:23 PM Ebert,Timothy Aaron <teb...@ufl.edu> > >>> wrote: > >>>> > >>>> I (maybe) agree, but I would go further than that. There are assumptions > >>>> associated with the test that are missing. It is not clear that the > >>>> relationships are all linear. Regardless of a "significant outcome" all > >>>> of the relationships need to be explored in more detail than what is > >>>> provided in the correlation test. > >>>> > >>>> Multiplicity adjustment as in : > >>>> https://www.sciencedirect.com/science/article/pii/S0197245600001069 is > >>>> not an issue that I can see in these data from the information provided. > >>>> At least not in the same sense as used in the link. > >>>> > >>>> My first guess at the meaning of "multiplicity adjustment" was closer to > >>>> the experimentwise error rate in a multiple comparison procedure. > >>>> https://dictionary.apa.org/experiment-wise-error-rateEssentially, the > >>>> type 1 error rate is inflated the more test you do and if you perform > >>>> enough tests you find significant outcomes by chance alone. There is > >>>> great significance in the Redskins rule: > >>>> https://en.wikipedia.org/wiki/Redskins_Rule. > >>>> > >>>> A simple solution is to apply a Bonferroni correction where alpha is > >>>> divided by the number of comparisons. If there are 250, then 0.05/250 = > >>>> 0.0002. Another approach is to try to discuss the outcomes in a way that > >>>> makes sense. What is the connection between a football team's last home > >>>> game an the election result that would enable me to take another team > >>>> and apply their last home game result to the outcome of a different > >>>> election? > >>>> > >>>> Another complication is if variables x2 through x250 are themselves > >>>> correlated. Not enough information was provided in the problem to know > >>>> if this is an issue, but 250 orthogonal variables in a real dataset > >>>> would be a bit unusual considering the experimentwise error rate > >>>> previously mentioned. > >>>> > >>>> Large datasets can be very messy. > >>>> > >>>> > >>>> Tim > >>>> > >>>> -----Original Message----- > >>>> From: Bert Gunter <bgunter.4...@gmail.com> > >>>> Sent: Monday, August 22, 2022 12:07 PM > >>>> To: Ebert,Timothy Aaron <teb...@ufl.edu> > >>>> Cc: Val <valkr...@gmail.com>; r-help@R-project.org > >>>> (r-help@r-project.org) <r-help@r-project.org> > >>>> Subject: Re: [R] Correlate > >>>> > >>>> [External Email] > >>>> > >>>> ... But of course the p-values are essentially meaningless without some > >>>> sort of multiplicity adjustment. > >>>> (search on "multiplicity adjustment" for details). :-( > >>>> > >>>> -- Bert > >>>> > >>>> > >>>> On Mon, Aug 22, 2022 at 8:59 AM Ebert,Timothy Aaron <teb...@ufl.edu> > >>>> wrote: > >>>>> > >>>>> A somewhat clunky solution: > >>>>> for(i in colnames(dat)){ > >>>>> print(cor.test(dat[,i], dat$x1, method = "pearson", use = > >>>>> "complete.obs")$estimate) > >>>>> print(cor.test(dat[,i], dat$x1, method = "pearson", use = > >>>>> "complete.obs")$p.value) } > >> > >> Because of missing data, this computes the correlations on different > >> subsets of the data. A simple solution is to filter the data for NAs: > >> > >> D <- na.omit(dat) > >> > >> More comments below: > >> > >>>>> > >>>>> Rather than printing you could set up an array or list to save the > >>>>> results. > >>>>> > >>>>> > >>>>> Tim > >>>>> > >>>>> -----Original Message----- > >>>>> From: R-help <r-help-boun...@r-project.org> On Behalf Of Val > >>>>> Sent: Monday, August 22, 2022 11:09 AM > >>>>> To: r-help@R-project.org (r-help@r-project.org) <r-help@r-project.org> > >>>>> Subject: [R] Correlate > >>>>> > >>>>> [External Email] > >>>>> > >>>>> Hi all, > >>>>> > >>>>> I have a data set with ~250 variables(columns). I want to calculate > >>>>> the correlation of one variable with the rest of the other variables > >>>>> and also want the p-values for each correlation. Please see the > >>>>> sample data and my attempt. I have got the correlation but unable to > >>>>> get the p-values > >>>>> > >>>>> dat <- read.table(text="x1 x2 x3 x4 > >>>>> 1.68 -0.96 -1.25 0.61 > >>>>> -0.06 0.41 0.06 -0.96 > >>>>> . 0.08 1.14 1.42 > >>>>> 0.80 -0.67 0.53 -0.68 > >>>>> 0.23 -0.97 -1.18 -0.78 > >>>>> -1.03 1.11 -0.61 . > >>>>> 2.15 . 0.02 0.66 > >>>>> 0.35 -0.37 -0.26 0.39 > >>>>> -0.66 0.89 . -1.49 > >>>>> 0.11 1.52 0.73 -1.03",header=TRUE) > >>>>> > >>>>> #change all to numeric > >>>>> dat[] <- lapply(dat, function(x) as.numeric(as.character(x))) > >> > >> This data manipulation is unnecessary. Just specify the argument > >> na.strings="." to read.table(). > >> > >>>>> > >>>>> data_cor <- cor(dat[ , colnames(dat) != "x1"], dat$x1, method = > >>>>> "pearson", use = "complete.obs") > >>>>> > >>>>> Result > >>>>> [,1] > >>>>> x2 -0.5845835 > >>>>> x3 -0.4664220 > >>>>> x4 0.7202837 > >>>>> > >>>>> How do I get the p-values ? > >> > >> Taking a somewhat different approach from cor.test(), you can apply > >> Fisher's z-transformation (recall that D is the data filtered for NAs): > >> > >> > 2*pnorm(abs(atanh(data_cor)), sd=1/sqrt(nrow(D) - 3), lower.tail=FALSE) > >> [,1] > >> x2 0.2462807 > >> x3 0.3812854 > >> x4 0.1156939 > >> > >> I hope this helps, > >> John > >> > >>>>> > >>>>> Thank you, > >>>>> > >>>>> ______________________________________________ > >>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>>>> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat > >>>>> .ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl > >>>>> .edu%7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e > >>>>> 1b84%7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w > >>>>> LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C > >>>>> &sdata=3iAfMs1QzQARKF3lqUI8s43PX4IIkgEuQ9PUDyUtpqY%3D&reserved > >>>>> =0 PLEASE do read the posting guide > >>>>> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r > >>>>> -project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu% > >>>>> 7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e1b84% > >>>>> 7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM > >>>>> DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C& > >>>>> sdata=v3IEonnPgg1xTKUzLK4rJc3cfMFxw5p%2FW6puha5CFz0%3D&reserved=0 > >>>>> and provide commented, minimal, self-contained, reproducible code. > >>>>> > >>>>> ______________________________________________ > >>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>>>> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat > >>>>> .ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl > >>>>> .edu%7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e > >>>>> 1b84%7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w > >>>>> LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C > >>>>> &sdata=3iAfMs1QzQARKF3lqUI8s43PX4IIkgEuQ9PUDyUtpqY%3D&reserved > >>>>> =0 PLEASE do read the posting guide > >>>>> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r > >>>>> -project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu% > >>>>> 7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e1b84% > >>>>> 7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM > >>>>> DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C& > >>>>> sdata=v3IEonnPgg1xTKUzLK4rJc3cfMFxw5p%2FW6puha5CFz0%3D&reserved=0 > >>>>> and provide commented, minimal, self-contained, reproducible code. > >>> > >>> ______________________________________________ > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >> -- > >> John Fox, Professor Emeritus > >> McMaster University > >> Hamilton, Ontario, Canada > >> web: https://socialsciences.mcmaster.ca/jfox/ > >> > -- > John Fox, Professor Emeritus > McMaster University > Hamilton, Ontario, Canada > web: https://socialsciences.mcmaster.ca/jfox/ > ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.