Hello all,

I am using spdep (via sfdep) for a cluster analysis of the rate of rare events. 
 I am hoping you can provide some advice on how to apply these functions most 
appropriately. Specifically I am interested in any guidance about which 
significance calculation might be best in these circumstances, and which (if 
any) adjustment for multiple testing and spatial dependence (Bonferroni, FDR, 
etc) should be paired with the different p value calculations.

When running localG_perm(), three Pr values are returned: Pr(z != E(Gi)), Pr(z 
!= E(Gi)) Sim, and Pr(folded) Sim. My understanding is that the first value is 
based on the mean and should only be used for normally distributed data, that 
the second uses a rank-percentile approach and is more robust, and that the 
last uses a Pysal-based calculation and may be quite sensitive. Is this 
correct? The second, Pr(z != E(Gi)) Sim, appears to be the most appropriate for 
my data situation; would you suggest otherwise?

The documentation for localG_perm states that "for inference, a Bonferroni-type 
test is suggested"; thus any adjustments for e.g. multiple testing must be made 
in a second step, such as with the p.adjust arguments in the hotspot() 
function, correct? Further, while fdr is the default for hotspot(), are there 
situations like having small numbers, a large number of simulations, or 
employing a particular Prname which would recommend a different p.adjust method?

Also, if I can bother you all with a very basic question: given that 
significance is determined through conditional permutation simulation, 
increasing the number of simulations should refine the results and make them 
more reliable, but unless a seed is set, I assume that is still always possible 
that results will change slightly across separate runs of a model, perhaps 
shifting an observation to either side of a threshold. Aside from computation 
time, are there other reasons to avoid increasing the number of simulations 
beyond a certain point? (It feels a bit like "p-hacking" to increase the nsim 
ad infinitum.) Are slight discrepancies in hot spot assignment between runs 
even with a large number of permutations to be expected? Is this particularly 
the case when working with small numbers?

Thank you for your time and consideration.


Angela R Cunningham, PhD
Spatial Demographer (R&D Associate)
Human Geography Group | Human Dynamics Section

Oak Ridge National Laboratory
Computational Sciences Building (5600), O401-29
1 Bethel Valley Road, Oak Ridge, TN 37830
cunningha...@ornl.gov


        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Reply via email to