Dear R users,

I hope that someone may be able to assist me with a problem I have.

I have data on the number of sports facilities located within datazones (a 
small measure of area) in Scotland. There are over six thousand datazones in 
Scotland and these are nested within local authorities. There are a high number 
of zero observations so a Poisson regression does not seem to be appropriate 
for the modelling as there is great overdispersion present. I am interested in 
assessing whether or not there is an association between the number of 
facilities and a datazone level measure of deprivation, adjusting for both 
datazone level population (using log(population) as an offset in the model) and 
urban/rural classification.

I tried fitting single level quasipoisson regression models and negative 
binomial models, adopting dummy variables for local authority (a fairly 
inefficient way to tackle this problem) using glm() with family=quasipoisson 
and glm.nb(), respectively. The negative binomial regression appeared to be the 
better option so I then assessed the residuals for this model using moran.mc() 
from the spdep package (an approach recommended by a spatial colleague) and 
found that there was significant residual spatial autocorrelation in my model 
(value of 0.23 with p-value < 0.0001). After adopting a multilevel negative 
binomial model using glmmPQL(), with local authorities as the higher level, the 
residual spatial autocorrelation remained the same.

I hoped to carry out some sort of spatial regression, such as CAR modelling, to 
attempt to remove the spatial autocorrelation to see if this has an impact on 
the association between area level deprivation and the number of facilities. I 
created a neighbours matrix for the datazones but as this is of dimension 
6412x6412, I struggled to identify a modelling approach that was equipped to 
deal with this matrix. I was advised to try a variety of approaches using 
information on the eastings and northings of my datazone centroids, such as 
including polynomials of the eastings and northings or smoothing terms of the 
eastings and northings (using GAM models) in my single level or multilevel 
models and reassessing the residual spatial autocorrelation. I tried this and 
it did not reduce the spatial autocorrelation. I was then advised to include a 
spatial parameter in the model based on the neighbours matrix, as follows:

S <- vector(length=6412)
for(i in 1:6412)
    {
 S[i] <- sum(bigmatrix[i,]*log((walk$all_walk10+0.5)/walk$totpop))
   }

where bigmatrix is my 6412x6412 neighbours matrix of 0s and 1s and, walk is my 
dataset with all_walk10 as my response and totpop is the population size, used 
as the offset. (This is similar to an approach adopted in 'Bayesian quantile 
regression for count data with application to environmental epidemiology', D. 
Lee and T. Neocleous, Royal Statistical Society Journal Series C). This 
approach reduced the residual spatial autocorrelation to 0.16 in the multilevel 
model. However, the correlation remained significant. I have no further ideas 
about how to remove the residual spatial autocorrelation and would appreciate 
any suggestions. It may be the case that the removal of the correlation would 
have no impact on the results obtained and conclusions reached for the 
association between deprivation and the number of facilities. However, I was 
hoping to be able to explore the difference that the removal of the correlation 
would have on the results.

Any suggestions would be greatly appreciated.

Cheers,
Karen 
        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Reply via email to