[R-sig-eco] Mantel test with multiple imputation, P values

2013-08-18 Thread Jacob Cram
Dear List,
  I have an environmental data set with several missing values that I
am trying to relate to a community structure data set using a mantel
test.   One solution to the missing data problem seems to be multiple
imputation; I am using the Amelia package. This generates several (five in
this example) imputed data sets.  I can run mantel on each of these and
come up with five similar but not identical solutions.  I figure I can
average the mantel rho values.  However, I am not sure what to do about the
P values. From looking around online, it looks like I shouldn't take the
average of p values.  I found this reference 
http://missingdata.lshtm.ac.uk/index.php?option=com_contentview=articleid=164:combining-p-values-from-multiple-imputationscatid=57:multiple-imputationItemid=98
that seems to have promising suggestions, but I can't seem to figure out
how I'd implement any of these in R.  I was hoping somebody might have a
suggestion for how I could combine my p values.  One option, I think would
be to take the highest (worst) p value (in the example below, this would be
p = 0.012).  However for large numbers of imputations, I am believe that
this method might be to conservative.  Another option might be to take the
p value corresponding to the median rho score (in the example below this
would be p =0.008).   Thoughts?
-Jacob


##Example Code Below
require(Amelia)
require(vegan)
require(ecodist)

##Species data matrix with environmental data that are missing some values.
data(varespec)
data(varechem)
varechem.missings - varechem[,c(N, P, K)]
varechem.missings[c(1,5, 7, 15, 20),1] - NA
varechem.missings[c(1,2, 9, 21), 2] - NA

#I multiply impute the missing values with the Amelia package
imps - 5
#imps - 25
varechem.amelia - amelia(varechem.missings, m = imps)


#for each imputation of the environmental data I run a mantel test and save
#the results to mresults
mresults - NULL

for(i in 1:imps){
varespec.dist - vegdist(varespec)
varechem.am.dist - dist(varechem.amelia$imputations[[i]])
mresults - rbind(mresults,
(ecodist::mantel(varespec.dist~varechem.am.dist)))
}

mresults

##mantelr pval1 pval2 pval3 llim.2.5% ulim.97.5%
## [1,] 0.2137656 0.008 0.993 0.008 0.1015176  0.3389979
## [2,] 0.2162388 0.011 0.990 0.011 0.1207528  0.3346554
## [3,] 0.2149556 0.012 0.989 0.016 0.1319943  0.3279028
## [4,] 0.2101820 0.009 0.992 0.012 0.1217293  0.3288272
## [5,] 0.2135279 0.006 0.995 0.006 0.1130386  0.3359864

#based on these results what would be a reasonable p value to report for
the environmental parameters relating to the community structure?

##end example

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Mantel test with multiple imputation, P values

2013-08-18 Thread Krzysztof Sakrejda
Hi Jacob, comments below.

On Aug 18, 2013 2:31 PM, Jacob Cram cramj...@gmail.com wrote:

 Dear List,
   I have an environmental data set with several missing values that I
 am trying to relate to a community structure data set using a mantel
 test.   One solution to the missing data problem seems to be multiple
 imputation; I am using the Amelia package. This generates several (five in
 this example) imputed data sets.  I can run mantel on each of these and
 come up with five similar but not identical solutions.  I figure I can
 average the mantel rho values.  However, I am not sure what to do about
the
 P values. From looking around online, it looks like I shouldn't take the
 average of p values.  I found this reference 

http://missingdata.lshtm.ac.uk/index.php?option=com_contentview=articleid=164:combining-p-values-from-multiple-imputationscatid=57:multiple-imputationItemid=98

 that seems to have promising suggestions, but I can't seem to figure out
 how I'd implement any of these in R.

So following that link and reading the Li et al. reference it looks as
though the procedure is well described at the top of page 71. You get your
parameter estimate from the usual procedure. The test statistic, written as
D, is the distance between the null value and the estimated value with
some scaling specified in eq. 1.17. They use the F distribution and k and m
(the number of imputations) degrees of freedom. I don't think you need to
reinvent some inferior ecologists-only procedure for this.

Krzysztof

I was hoping somebody might have a
 suggestion for how I could combine my p values.  One option, I think would
 be to take the highest (worst) p value (in the example below, this would
be
 p = 0.012).  However for large numbers of imputations, I am believe that
 this method might be to conservative.  Another option might be to take the
 p value corresponding to the median rho score (in the example below this
 would be p =0.008).   Thoughts?
 -Jacob


 ##Example Code Below
 require(Amelia)
 require(vegan)
 require(ecodist)

 ##Species data matrix with environmental data that are missing some
values.
 data(varespec)
 data(varechem)
 varechem.missings - varechem[,c(N, P, K)]
 varechem.missings[c(1,5, 7, 15, 20),1] - NA
 varechem.missings[c(1,2, 9, 21), 2] - NA

 #I multiply impute the missing values with the Amelia package
 imps - 5
 #imps - 25
 varechem.amelia - amelia(varechem.missings, m = imps)


 #for each imputation of the environmental data I run a mantel test and
save
 #the results to mresults
 mresults - NULL

 for(i in 1:imps){
 varespec.dist - vegdist(varespec)
 varechem.am.dist - dist(varechem.amelia$imputations[[i]])
 mresults - rbind(mresults,
 (ecodist::mantel(varespec.dist~varechem.am.dist)))
 }

 mresults

 ##mantelr pval1 pval2 pval3 llim.2.5% ulim.97.5%
 ## [1,] 0.2137656 0.008 0.993 0.008 0.1015176  0.3389979
 ## [2,] 0.2162388 0.011 0.990 0.011 0.1207528  0.3346554
 ## [3,] 0.2149556 0.012 0.989 0.016 0.1319943  0.3279028
 ## [4,] 0.2101820 0.009 0.992 0.012 0.1217293  0.3288272
 ## [5,] 0.2135279 0.006 0.995 0.006 0.1130386  0.3359864

 #based on these results what would be a reasonable p value to report for
 the environmental parameters relating to the community structure?

 ##end example

 [[alternative HTML version deleted]]

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Mantel test with multiple imputation, P values

2013-08-18 Thread Jacob Cram
Krzystof,
   Thank you for your reply.  It is promising that you think that Li's
method should work for imputed mantel results.  That said, I am a bit over
my head with the math in the Li et al reference.  Could you (or anyone on
this list) provide an R-code example of how D would be calculated from the
output of several mantel tests?

Also, and forgive my ignorance if this statement is coming from the wrong
direction, It is my understanding that mantel P values are generally
calculated by actually permuting the rows of the similarity matrix and
re-running the mantel test a given number of times (usually 1000).
Accordingly, as far as I can tell there is no explicitly generated F value
corresponding to a mantel p value.  It seems like Li's method assumes I am
generating P from an F table.  Would it be appropriate to back calculate F
from P, k and m?

Thanks again,
Jacob



On Sun, Aug 18, 2013 at 2:19 PM, Krzysztof Sakrejda 
krzysztof.sakre...@gmail.com wrote:

 Hi Jacob, comments below.

 On Aug 18, 2013 2:31 PM, Jacob Cram cramj...@gmail.com wrote:
 
  Dear List,
I have an environmental data set with several missing values that I
  am trying to relate to a community structure data set using a mantel
  test.   One solution to the missing data problem seems to be multiple
  imputation; I am using the Amelia package. This generates several (five
 in
  this example) imputed data sets.  I can run mantel on each of these and
  come up with five similar but not identical solutions.  I figure I can
  average the mantel rho values.  However, I am not sure what to do about
 the
  P values. From looking around online, it looks like I shouldn't take the
  average of p values.  I found this reference 
 
 http://missingdata.lshtm.ac.uk/index.php?option=com_contentview=articleid=164:combining-p-values-from-multiple-imputationscatid=57:multiple-imputationItemid=98
 
  that seems to have promising suggestions, but I can't seem to figure out
  how I'd implement any of these in R.

 So following that link and reading the Li et al. reference it looks as
 though the procedure is well described at the top of page 71. You get your
 parameter estimate from the usual procedure. The test statistic, written as
 D, is the distance between the null value and the estimated value with
 some scaling specified in eq. 1.17. They use the F distribution and k and m
 (the number of imputations) degrees of freedom. I don't think you need to
 reinvent some inferior ecologists-only procedure for this.

 Krzysztof

 I was hoping somebody might have a
  suggestion for how I could combine my p values.  One option, I think
 would
  be to take the highest (worst) p value (in the example below, this would
 be
  p = 0.012).  However for large numbers of imputations, I am believe that
  this method might be to conservative.  Another option might be to take
 the
  p value corresponding to the median rho score (in the example below this
  would be p =0.008).   Thoughts?
  -Jacob
 
 
  ##Example Code Below
  require(Amelia)
  require(vegan)
  require(ecodist)
 
  ##Species data matrix with environmental data that are missing some
 values.
  data(varespec)
  data(varechem)
  varechem.missings - varechem[,c(N, P, K)]
  varechem.missings[c(1,5, 7, 15, 20),1] - NA
  varechem.missings[c(1,2, 9, 21), 2] - NA
 
  #I multiply impute the missing values with the Amelia package
  imps - 5
  #imps - 25
  varechem.amelia - amelia(varechem.missings, m = imps)
 
 
  #for each imputation of the environmental data I run a mantel test and
 save
  #the results to mresults
  mresults - NULL
 
  for(i in 1:imps){
  varespec.dist - vegdist(varespec)
  varechem.am.dist - dist(varechem.amelia$imputations[[i]])
  mresults - rbind(mresults,
  (ecodist::mantel(varespec.dist~varechem.am.dist)))
  }
 
  mresults
 
  ##mantelr pval1 pval2 pval3 llim.2.5% ulim.97.5%
  ## [1,] 0.2137656 0.008 0.993 0.008 0.1015176  0.3389979
  ## [2,] 0.2162388 0.011 0.990 0.011 0.1207528  0.3346554
  ## [3,] 0.2149556 0.012 0.989 0.016 0.1319943  0.3279028
  ## [4,] 0.2101820 0.009 0.992 0.012 0.1217293  0.3288272
  ## [5,] 0.2135279 0.006 0.995 0.006 0.1130386  0.3359864
 
  #based on these results what would be a reasonable p value to report for
  the environmental parameters relating to the community structure?
 
  ##end example
 
  [[alternative HTML version deleted]]
 
  ___
  R-sig-ecology mailing list
  R-sig-ecology@r-project.org
  https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Mantel test with multiple imputation, P values

2013-08-18 Thread Krzysztof Sakrejda
I think D is the easy part, it should be the mean Mantel test statistic
from all the imputed data sets.  The rest I'm not sure yet, but if you get
a good answer from someone else, I'd appreciate it if you shared it.
Krzysztof
On Aug 18, 2013 4:44 PM, Jacob Cram cramj...@gmail.com wrote:

 Krzystof,
Thank you for your reply.  It is promising that you think that Li's
 method should work for imputed mantel results.  That said, I am a bit over
 my head with the math in the Li et al reference.  Could you (or anyone on
 this list) provide an R-code example of how D would be calculated from the
 output of several mantel tests?

 Also, and forgive my ignorance if this statement is coming from the wrong
 direction, It is my understanding that mantel P values are generally
 calculated by actually permuting the rows of the similarity matrix and
 re-running the mantel test a given number of times (usually 1000).
 Accordingly, as far as I can tell there is no explicitly generated F value
 corresponding to a mantel p value.  It seems like Li's method assumes I am
 generating P from an F table.  Would it be appropriate to back calculate F
 from P, k and m?

 Thanks again,
 Jacob



 On Sun, Aug 18, 2013 at 2:19 PM, Krzysztof Sakrejda 
 krzysztof.sakre...@gmail.com wrote:

 Hi Jacob, comments below.

 On Aug 18, 2013 2:31 PM, Jacob Cram cramj...@gmail.com wrote:
 
  Dear List,
I have an environmental data set with several missing values that
 I
  am trying to relate to a community structure data set using a mantel
  test.   One solution to the missing data problem seems to be multiple
  imputation; I am using the Amelia package. This generates several (five
 in
  this example) imputed data sets.  I can run mantel on each of these and
  come up with five similar but not identical solutions.  I figure I can
  average the mantel rho values.  However, I am not sure what to do about
 the
  P values. From looking around online, it looks like I shouldn't take the
  average of p values.  I found this reference 
 
 http://missingdata.lshtm.ac.uk/index.php?option=com_contentview=articleid=164:combining-p-values-from-multiple-imputationscatid=57:multiple-imputationItemid=98
 
  that seems to have promising suggestions, but I can't seem to figure out
  how I'd implement any of these in R.

 So following that link and reading the Li et al. reference it looks as
 though the procedure is well described at the top of page 71. You get your
 parameter estimate from the usual procedure. The test statistic, written as
 D, is the distance between the null value and the estimated value with
 some scaling specified in eq. 1.17. They use the F distribution and k and m
 (the number of imputations) degrees of freedom. I don't think you need to
 reinvent some inferior ecologists-only procedure for this.

 Krzysztof

 I was hoping somebody might have a
  suggestion for how I could combine my p values.  One option, I think
 would
  be to take the highest (worst) p value (in the example below, this
 would be
  p = 0.012).  However for large numbers of imputations, I am believe that
  this method might be to conservative.  Another option might be to take
 the
  p value corresponding to the median rho score (in the example below this
  would be p =0.008).   Thoughts?
  -Jacob
 
 
  ##Example Code Below
  require(Amelia)
  require(vegan)
  require(ecodist)
 
  ##Species data matrix with environmental data that are missing some
 values.
  data(varespec)
  data(varechem)
  varechem.missings - varechem[,c(N, P, K)]
  varechem.missings[c(1,5, 7, 15, 20),1] - NA
  varechem.missings[c(1,2, 9, 21), 2] - NA
 
  #I multiply impute the missing values with the Amelia package
  imps - 5
  #imps - 25
  varechem.amelia - amelia(varechem.missings, m = imps)
 
 
  #for each imputation of the environmental data I run a mantel test and
 save
  #the results to mresults
  mresults - NULL
 
  for(i in 1:imps){
  varespec.dist - vegdist(varespec)
  varechem.am.dist - dist(varechem.amelia$imputations[[i]])
  mresults - rbind(mresults,
  (ecodist::mantel(varespec.dist~varechem.am.dist)))
  }
 
  mresults
 
  ##mantelr pval1 pval2 pval3 llim.2.5% ulim.97.5%
  ## [1,] 0.2137656 0.008 0.993 0.008 0.1015176  0.3389979
  ## [2,] 0.2162388 0.011 0.990 0.011 0.1207528  0.3346554
  ## [3,] 0.2149556 0.012 0.989 0.016 0.1319943  0.3279028
  ## [4,] 0.2101820 0.009 0.992 0.012 0.1217293  0.3288272
  ## [5,] 0.2135279 0.006 0.995 0.006 0.1130386  0.3359864
 
  #based on these results what would be a reasonable p value to report for
  the environmental parameters relating to the community structure?
 
  ##end example
 
  [[alternative HTML version deleted]]
 
  ___
  R-sig-ecology mailing list
  R-sig-ecology@r-project.org
  https://stat.ethz.ch/mailman/listinfo/r-sig-ecology




[[alternative HTML version deleted]]

___
R-sig-ecology mailing list