[R-sig-eco] Mantel test with multiple imputation, P values
Dear List, I have an environmental data set with several missing values that I am trying to relate to a community structure data set using a mantel test. One solution to the missing data problem seems to be multiple imputation; I am using the Amelia package. This generates several (five in this example) imputed data sets. I can run mantel on each of these and come up with five similar but not identical solutions. I figure I can average the mantel rho values. However, I am not sure what to do about the P values. From looking around online, it looks like I shouldn't take the average of p values. I found this reference http://missingdata.lshtm.ac.uk/index.php?option=com_contentview=articleid=164:combining-p-values-from-multiple-imputationscatid=57:multiple-imputationItemid=98 that seems to have promising suggestions, but I can't seem to figure out how I'd implement any of these in R. I was hoping somebody might have a suggestion for how I could combine my p values. One option, I think would be to take the highest (worst) p value (in the example below, this would be p = 0.012). However for large numbers of imputations, I am believe that this method might be to conservative. Another option might be to take the p value corresponding to the median rho score (in the example below this would be p =0.008). Thoughts? -Jacob ##Example Code Below require(Amelia) require(vegan) require(ecodist) ##Species data matrix with environmental data that are missing some values. data(varespec) data(varechem) varechem.missings - varechem[,c(N, P, K)] varechem.missings[c(1,5, 7, 15, 20),1] - NA varechem.missings[c(1,2, 9, 21), 2] - NA #I multiply impute the missing values with the Amelia package imps - 5 #imps - 25 varechem.amelia - amelia(varechem.missings, m = imps) #for each imputation of the environmental data I run a mantel test and save #the results to mresults mresults - NULL for(i in 1:imps){ varespec.dist - vegdist(varespec) varechem.am.dist - dist(varechem.amelia$imputations[[i]]) mresults - rbind(mresults, (ecodist::mantel(varespec.dist~varechem.am.dist))) } mresults ##mantelr pval1 pval2 pval3 llim.2.5% ulim.97.5% ## [1,] 0.2137656 0.008 0.993 0.008 0.1015176 0.3389979 ## [2,] 0.2162388 0.011 0.990 0.011 0.1207528 0.3346554 ## [3,] 0.2149556 0.012 0.989 0.016 0.1319943 0.3279028 ## [4,] 0.2101820 0.009 0.992 0.012 0.1217293 0.3288272 ## [5,] 0.2135279 0.006 0.995 0.006 0.1130386 0.3359864 #based on these results what would be a reasonable p value to report for the environmental parameters relating to the community structure? ##end example [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Mantel test with multiple imputation, P values
Hi Jacob, comments below. On Aug 18, 2013 2:31 PM, Jacob Cram cramj...@gmail.com wrote: Dear List, I have an environmental data set with several missing values that I am trying to relate to a community structure data set using a mantel test. One solution to the missing data problem seems to be multiple imputation; I am using the Amelia package. This generates several (five in this example) imputed data sets. I can run mantel on each of these and come up with five similar but not identical solutions. I figure I can average the mantel rho values. However, I am not sure what to do about the P values. From looking around online, it looks like I shouldn't take the average of p values. I found this reference http://missingdata.lshtm.ac.uk/index.php?option=com_contentview=articleid=164:combining-p-values-from-multiple-imputationscatid=57:multiple-imputationItemid=98 that seems to have promising suggestions, but I can't seem to figure out how I'd implement any of these in R. So following that link and reading the Li et al. reference it looks as though the procedure is well described at the top of page 71. You get your parameter estimate from the usual procedure. The test statistic, written as D, is the distance between the null value and the estimated value with some scaling specified in eq. 1.17. They use the F distribution and k and m (the number of imputations) degrees of freedom. I don't think you need to reinvent some inferior ecologists-only procedure for this. Krzysztof I was hoping somebody might have a suggestion for how I could combine my p values. One option, I think would be to take the highest (worst) p value (in the example below, this would be p = 0.012). However for large numbers of imputations, I am believe that this method might be to conservative. Another option might be to take the p value corresponding to the median rho score (in the example below this would be p =0.008). Thoughts? -Jacob ##Example Code Below require(Amelia) require(vegan) require(ecodist) ##Species data matrix with environmental data that are missing some values. data(varespec) data(varechem) varechem.missings - varechem[,c(N, P, K)] varechem.missings[c(1,5, 7, 15, 20),1] - NA varechem.missings[c(1,2, 9, 21), 2] - NA #I multiply impute the missing values with the Amelia package imps - 5 #imps - 25 varechem.amelia - amelia(varechem.missings, m = imps) #for each imputation of the environmental data I run a mantel test and save #the results to mresults mresults - NULL for(i in 1:imps){ varespec.dist - vegdist(varespec) varechem.am.dist - dist(varechem.amelia$imputations[[i]]) mresults - rbind(mresults, (ecodist::mantel(varespec.dist~varechem.am.dist))) } mresults ##mantelr pval1 pval2 pval3 llim.2.5% ulim.97.5% ## [1,] 0.2137656 0.008 0.993 0.008 0.1015176 0.3389979 ## [2,] 0.2162388 0.011 0.990 0.011 0.1207528 0.3346554 ## [3,] 0.2149556 0.012 0.989 0.016 0.1319943 0.3279028 ## [4,] 0.2101820 0.009 0.992 0.012 0.1217293 0.3288272 ## [5,] 0.2135279 0.006 0.995 0.006 0.1130386 0.3359864 #based on these results what would be a reasonable p value to report for the environmental parameters relating to the community structure? ##end example [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Mantel test with multiple imputation, P values
Krzystof, Thank you for your reply. It is promising that you think that Li's method should work for imputed mantel results. That said, I am a bit over my head with the math in the Li et al reference. Could you (or anyone on this list) provide an R-code example of how D would be calculated from the output of several mantel tests? Also, and forgive my ignorance if this statement is coming from the wrong direction, It is my understanding that mantel P values are generally calculated by actually permuting the rows of the similarity matrix and re-running the mantel test a given number of times (usually 1000). Accordingly, as far as I can tell there is no explicitly generated F value corresponding to a mantel p value. It seems like Li's method assumes I am generating P from an F table. Would it be appropriate to back calculate F from P, k and m? Thanks again, Jacob On Sun, Aug 18, 2013 at 2:19 PM, Krzysztof Sakrejda krzysztof.sakre...@gmail.com wrote: Hi Jacob, comments below. On Aug 18, 2013 2:31 PM, Jacob Cram cramj...@gmail.com wrote: Dear List, I have an environmental data set with several missing values that I am trying to relate to a community structure data set using a mantel test. One solution to the missing data problem seems to be multiple imputation; I am using the Amelia package. This generates several (five in this example) imputed data sets. I can run mantel on each of these and come up with five similar but not identical solutions. I figure I can average the mantel rho values. However, I am not sure what to do about the P values. From looking around online, it looks like I shouldn't take the average of p values. I found this reference http://missingdata.lshtm.ac.uk/index.php?option=com_contentview=articleid=164:combining-p-values-from-multiple-imputationscatid=57:multiple-imputationItemid=98 that seems to have promising suggestions, but I can't seem to figure out how I'd implement any of these in R. So following that link and reading the Li et al. reference it looks as though the procedure is well described at the top of page 71. You get your parameter estimate from the usual procedure. The test statistic, written as D, is the distance between the null value and the estimated value with some scaling specified in eq. 1.17. They use the F distribution and k and m (the number of imputations) degrees of freedom. I don't think you need to reinvent some inferior ecologists-only procedure for this. Krzysztof I was hoping somebody might have a suggestion for how I could combine my p values. One option, I think would be to take the highest (worst) p value (in the example below, this would be p = 0.012). However for large numbers of imputations, I am believe that this method might be to conservative. Another option might be to take the p value corresponding to the median rho score (in the example below this would be p =0.008). Thoughts? -Jacob ##Example Code Below require(Amelia) require(vegan) require(ecodist) ##Species data matrix with environmental data that are missing some values. data(varespec) data(varechem) varechem.missings - varechem[,c(N, P, K)] varechem.missings[c(1,5, 7, 15, 20),1] - NA varechem.missings[c(1,2, 9, 21), 2] - NA #I multiply impute the missing values with the Amelia package imps - 5 #imps - 25 varechem.amelia - amelia(varechem.missings, m = imps) #for each imputation of the environmental data I run a mantel test and save #the results to mresults mresults - NULL for(i in 1:imps){ varespec.dist - vegdist(varespec) varechem.am.dist - dist(varechem.amelia$imputations[[i]]) mresults - rbind(mresults, (ecodist::mantel(varespec.dist~varechem.am.dist))) } mresults ##mantelr pval1 pval2 pval3 llim.2.5% ulim.97.5% ## [1,] 0.2137656 0.008 0.993 0.008 0.1015176 0.3389979 ## [2,] 0.2162388 0.011 0.990 0.011 0.1207528 0.3346554 ## [3,] 0.2149556 0.012 0.989 0.016 0.1319943 0.3279028 ## [4,] 0.2101820 0.009 0.992 0.012 0.1217293 0.3288272 ## [5,] 0.2135279 0.006 0.995 0.006 0.1130386 0.3359864 #based on these results what would be a reasonable p value to report for the environmental parameters relating to the community structure? ##end example [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Mantel test with multiple imputation, P values
I think D is the easy part, it should be the mean Mantel test statistic from all the imputed data sets. The rest I'm not sure yet, but if you get a good answer from someone else, I'd appreciate it if you shared it. Krzysztof On Aug 18, 2013 4:44 PM, Jacob Cram cramj...@gmail.com wrote: Krzystof, Thank you for your reply. It is promising that you think that Li's method should work for imputed mantel results. That said, I am a bit over my head with the math in the Li et al reference. Could you (or anyone on this list) provide an R-code example of how D would be calculated from the output of several mantel tests? Also, and forgive my ignorance if this statement is coming from the wrong direction, It is my understanding that mantel P values are generally calculated by actually permuting the rows of the similarity matrix and re-running the mantel test a given number of times (usually 1000). Accordingly, as far as I can tell there is no explicitly generated F value corresponding to a mantel p value. It seems like Li's method assumes I am generating P from an F table. Would it be appropriate to back calculate F from P, k and m? Thanks again, Jacob On Sun, Aug 18, 2013 at 2:19 PM, Krzysztof Sakrejda krzysztof.sakre...@gmail.com wrote: Hi Jacob, comments below. On Aug 18, 2013 2:31 PM, Jacob Cram cramj...@gmail.com wrote: Dear List, I have an environmental data set with several missing values that I am trying to relate to a community structure data set using a mantel test. One solution to the missing data problem seems to be multiple imputation; I am using the Amelia package. This generates several (five in this example) imputed data sets. I can run mantel on each of these and come up with five similar but not identical solutions. I figure I can average the mantel rho values. However, I am not sure what to do about the P values. From looking around online, it looks like I shouldn't take the average of p values. I found this reference http://missingdata.lshtm.ac.uk/index.php?option=com_contentview=articleid=164:combining-p-values-from-multiple-imputationscatid=57:multiple-imputationItemid=98 that seems to have promising suggestions, but I can't seem to figure out how I'd implement any of these in R. So following that link and reading the Li et al. reference it looks as though the procedure is well described at the top of page 71. You get your parameter estimate from the usual procedure. The test statistic, written as D, is the distance between the null value and the estimated value with some scaling specified in eq. 1.17. They use the F distribution and k and m (the number of imputations) degrees of freedom. I don't think you need to reinvent some inferior ecologists-only procedure for this. Krzysztof I was hoping somebody might have a suggestion for how I could combine my p values. One option, I think would be to take the highest (worst) p value (in the example below, this would be p = 0.012). However for large numbers of imputations, I am believe that this method might be to conservative. Another option might be to take the p value corresponding to the median rho score (in the example below this would be p =0.008). Thoughts? -Jacob ##Example Code Below require(Amelia) require(vegan) require(ecodist) ##Species data matrix with environmental data that are missing some values. data(varespec) data(varechem) varechem.missings - varechem[,c(N, P, K)] varechem.missings[c(1,5, 7, 15, 20),1] - NA varechem.missings[c(1,2, 9, 21), 2] - NA #I multiply impute the missing values with the Amelia package imps - 5 #imps - 25 varechem.amelia - amelia(varechem.missings, m = imps) #for each imputation of the environmental data I run a mantel test and save #the results to mresults mresults - NULL for(i in 1:imps){ varespec.dist - vegdist(varespec) varechem.am.dist - dist(varechem.amelia$imputations[[i]]) mresults - rbind(mresults, (ecodist::mantel(varespec.dist~varechem.am.dist))) } mresults ##mantelr pval1 pval2 pval3 llim.2.5% ulim.97.5% ## [1,] 0.2137656 0.008 0.993 0.008 0.1015176 0.3389979 ## [2,] 0.2162388 0.011 0.990 0.011 0.1207528 0.3346554 ## [3,] 0.2149556 0.012 0.989 0.016 0.1319943 0.3279028 ## [4,] 0.2101820 0.009 0.992 0.012 0.1217293 0.3288272 ## [5,] 0.2135279 0.006 0.995 0.006 0.1130386 0.3359864 #based on these results what would be a reasonable p value to report for the environmental parameters relating to the community structure? ##end example [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology [[alternative HTML version deleted]] ___ R-sig-ecology mailing list