[R-sig-Geo] Spatial autocorrelation test in a dataset with multiple observations per region

Hongyun Wang Wed, 18 Dec 2024 14:26:28 -0800

Hello Everyone,

I have one question on the spatial autoregression tests on primary variables or 
residuals of linear regression models of a dataset that has multiple 
observations per region. Here I use columbus dataset and moran.test() on a 
primary variable CRIME as illustration.


I modified the dataset to have only 7 regions, not 49 regions in original 
dataset, and each region has 7 observations. I used poly2nb() and nb2listw() to 
create neighbors and weights list of each region. But after examining the list 
carefully, I found that two observations within the same region have weight 1. 
This looks not right. I think poly2nb function treats each observation (a row 
in the dataset) as a region. Then I modified the neighbors and weights list to 
have weight 0 for two observations that are in the same region. The moran.test 
results from original listw object (listw0) and modified listw object (listw) 
have different Z-scores and p-values. The which moran.test is correct, or are 
both incorrect?

> library(dplyr)
> library(spdep)
>
> # read dataset and modify it: only 7 regions, 7 observations per region.
> columbus <- st_read(system.file("shapes/columbus.shp", package="spData")[1], 
> quiet=TRUE) %>%
+   mutate(regionId = rep(LETTERS[1:7], each = 7),
+          geometry = rep(.$geometry[1:7], each = 7))
> columbus %>% as.data.frame() %>% select(regionId,geometry) %>% unique() %>% 
> dim()
[1] 7 2
>
> # neighbor list and weights
> nblist <- poly2nb(as(columbus, "Spatial"))
> listw0 <- listw <-  nb2listw(nblist, zero.policy=T, style="B")
>
> # 1.Moran I test on primary variable CRIME using original weights list
> (m.test <- moran.test(columbus$CRIME, listw0,
+                       randomisation=FALSE, alternative="greater"))

     Moran I test under normality

data:  columbus$CRIME
weights: listw0

Moran I statistic standard deviate = 2.6121, p-value = 0.004499
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance
     0.0497371251     -0.0208333333      0.0007298841

>
> # change weights in listw: two observations within the same region have 
> weight 0.
> columbus$ID = seq(nrow(columbus))
> for (i in seq_along(listw$neighbours)){
+   listw$neighbours[[i]] = setdiff(listw$neighbours[[i]],  
subset(as.data.frame(columbus), regionId == as.data.frame(columbus)[i, 
"regionId"])$ID)
+   listw$weights[[i]] = rep(1, length(listw$neighbours[[i]]))
+
+   if (length(listw$neighbours[[i]]) == 0) {
+     listw$neighbours[[i]] = 0L
+     listw$weights[i] = list(NULL)
+   }
+ }
>
> # 2.Moran I test on primary variable CRIME using modified weights list
> (m.test <- moran.test(columbus$CRIME, listw,
+                       randomisation=FALSE, alternative="greater"))

     Moran I test under normality

data:  columbus$CRIME
weights: listw  n reduced by no-neighbour observations


Moran I statistic standard deviate = 1.1726, p-value = 0.1205
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance
      0.014619659      -0.024390244       0.001106761

>
> # the neighbors and weights of the first observation
> ## original listw
> card(listw0$neighbours)[[1]]
[1] 20
> listw0$neighbours[[1]]
[1]  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21
> listw0$weights[[1]]
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>
> ## modified listw
> card(listw$neighbours)[[1]]
[1] 14
> listw$neighbours[[1]]
[1]  8  9 10 11 12 13 14 15 16 17 18 19 20 21
> listw$weights[[1]]
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>
> # overview of listw0 and listw
> listw0
Characteristics of weights list object:
Neighbour list object:
Number of regions: 49
Number of nonzero links: 1078
Percentage nonzero weights: 44.89796
Average number of links: 22
2 disjoint connected subgraphs

Weights style: B
Weights constants summary:
   n   nn   S0   S1     S2
B 49 2401 1078 2156 110544
> listw
Characteristics of weights list object:
Neighbour list object:
Number of regions: 49
Number of nonzero links: 784
Percentage nonzero weights: 32.65306
Average number of links: 16
7 regions with no links:
43 44 45 46 47 48 49
8 disjoint connected subgraphs

Weights style: B
Weights constants summary:
   n   nn  S0   S1    S2
B 42 1764 784 1568 65856

Thank you in advance and sorry for the inconvenience.

Hongyun Wang

Sr. Data Scientist � Contracted
Bayer AG
hongyun.wang....@bayer.com

________________________________

The information contained in this e-mail is for the excl...{{dropped:14}}

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

[R-sig-Geo] Spatial autocorrelation test in a dataset with multiple observations per region

Reply via email to