Re: [R-sig-Geo] Seeking guidance for best application of spdep localG_perm

Roger Bivand Thu, 04 Jul 2024 01:53:26 -0700

Thanks!

The issue Josiah referred to for pysal/esda is: 
https://github.com/pysal/esda/issues/199; as you can see, things take time. 
However, the problem was only found after Bivand & Wong (2018 
https://openaccess.nhh.no/nhh-xmlui/handle/11250/2565494?show=full, 
https://doi.org/10.1007/s11749-018-0599-x). So yes, for non-bell-shaped 
variables and using conditional permutations,  "Pr(z != E(Gi)) Sim" should be 
preferred to "Pr(folded) Sim" and "Pr(z != E(Gi))". "Pr(z != E(Gi)) Sim" comes 
from punif() on the unit interval [0, 1] and for the number of conditional 
simulations and alternative (default "two.sided"). If the variable is 
bell-shaped, probably conditional permutation is unnecessary, as the analytical 
values will be very close to "Pr(z != E(Gi))" for the same alternative.

The general direction of https://r-spatial.org/book/15-Measures.html and  
https://doi.org/10.1111/gean.12319 may be summarised as:

1) ESDA is ESDA, p-values are just a measure in no sense implying anything 
inferential - use of the term "interesting" as proposed in Anselin (2019, 
https://doi.org/10.1111/gean.12164) is judicious. In all cases, adjustment for 
multiple comparisons is judicious, to avoid everything being seen as 
interesting. FDR-adjustment is a reasonable compromise.

2) ESDA is also prone to mix up global and local spatial dependence, and local 
spatial dependence with local spatial heterogeneity; this makes any statement 
about results conditional on outcomes conditional on their not neing affected 
by those omissions.

3) ESDA should arguably be carried out on residuals of fitted models, because 
covariates and/or a modelled global spatial process could account for global 
dependence and heterogeneity.

All one gets from increasing the number of simulations is more digits in the 
p-value; the value as a measure not  an inferential test will be more precise. 
This is not p-hacking, it just increases the number of digits. With 99 draws, 
and adding the observed Gi, if it scores rank 1, its one-sided "greater" 
pseudo-p will be 0.01. If we go to 999 draws, and obseved Gi still has rank 1, 
we get to 0.001, which is a more precise estimate of the same value, observed 
Gi has not changed.

> spdep:::probs_lut(nsim=99, alternative="greater")[1]
[1] 0.01
> spdep:::probs_lut(nsim=999, alternative="greater")[1]
[1] 0.001

As permutations are random, results will vary between successive uses of the 
same function and arguments unless the RNG seed is re-set to the same value, so 
yielding the same stream(s) of random numbers.

Probably this should be written up properly, a broader community effort to 
provide guidance would be most welcome!

Please do comment if you feel that the above is too categorical!

Thanks for an interesting set of questions!

Roger

--
Roger Bivand
Emeritus Professor
Norwegian School of Economics
Postboks 3490 Ytre Sandviken, 5045 Bergen, Norway
roger.biv...@nhh.no

________________________________________
From: R-sig-Geo <r-sig-geo-boun...@r-project.org> on behalf of Josiah Parry 
<josiah.pa...@gmail.com>
Sent: 03 July 2024 13:34
To: Cunningham, Angela
Cc: r-sig-geo@r-project.org
Subject: Re: [R-sig-Geo] Seeking guidance for best application of spdep 
localG_perm

This is all very well said! I would recommend using the percentile base
approach that Roger implemented. The Pysal folks are in the process of
adopting it (with a slight adjustment). I think it is the most “accurate”
p-value you will get from the functions today.

I don’t have a recommendation for the upper bound. But you do bring up a
good point about the classification of them. I don’t think I’m as qualified
to answer that !

On Wed, Jul 3, 2024 at 06:25 Cunningham, Angela via R-sig-Geo <
r-sig-geo@r-project.org> wrote:

> Hello all,
>
> I am using spdep (via sfdep) for a cluster analysis of the rate of rare
> events.  I am hoping you can provide some advice on how to apply these
> functions most appropriately. Specifically I am interested in any guidance
> about which significance calculation might be best in these circumstances,
> and which (if any) adjustment for multiple testing and spatial dependence
> (Bonferroni, FDR, etc) should be paired with the different p value
> calculations.
>
> When running localG_perm(), three Pr values are returned: Pr(z != E(Gi)),
> Pr(z != E(Gi)) Sim, and Pr(folded) Sim. My understanding is that the first
> value is based on the mean and should only be used for normally distributed
> data, that the second uses a rank-percentile approach and is more robust,
> and that the last uses a Pysal-based calculation and may be quite
> sensitive. Is this correct? The second, Pr(z != E(Gi)) Sim, appears to be
> the most appropriate for my data situation; would you suggest otherwise?
>
> The documentation for localG_perm states that "for inference, a
> Bonferroni-type test is suggested"; thus any adjustments for e.g. multiple
> testing must be made in a second step, such as with the p.adjust arguments
> in the hotspot() function, correct? Further, while fdr is the default for
> hotspot(), are there situations like having small numbers, a large number
> of simulations, or employing a particular Prname which would recommend a
> different p.adjust method?
>
> Also, if I can bother you all with a very basic question: given that
> significance is determined through conditional permutation simulation,
> increasing the number of simulations should refine the results and make
> them more reliable, but unless a seed is set, I assume that is still always
> possible that results will change slightly across separate runs of a model,
> perhaps shifting an observation to either side of a threshold. Aside from
> computation time, are there other reasons to avoid increasing the number of
> simulations beyond a certain point? (It feels a bit like "p-hacking" to
> increase the nsim ad infinitum.) Are slight discrepancies in hot spot
> assignment between runs even with a large number of permutations to be
> expected? Is this particularly the case when working with small numbers?
>
> Thank you for your time and consideration.
>
>
> Angela R Cunningham, PhD
> Spatial Demographer (R&D Associate)
> Human Geography Group | Human Dynamics Section
>
> Oak Ridge National Laboratory
> Computational Sciences Building (5600), O401-29
> 1 Bethel Valley Road, Oak Ridge, TN 37830
> <https://www.google.com/maps/search/1+Bethel+Valley+Road,+Oak+Ridge,+TN+37830?entry=gmail&source=g>
> cunningha...@ornl.gov
>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Re: [R-sig-Geo] Seeking guidance for best application of spdep localG_perm

Reply via email to