Dear R users, I hope that someone may be able to assist me with a problem I have.
I have data on the number of sports facilities located within datazones (a small measure of area) in Scotland. There are over six thousand datazones in Scotland and these are nested within local authorities. There are a high number of zero observations so a Poisson regression does not seem to be appropriate for the modelling as there is great overdispersion present. I am interested in assessing whether or not there is an association between the number of facilities and a datazone level measure of deprivation, adjusting for both datazone level population (using log(population) as an offset in the model) and urban/rural classification. I tried fitting single level quasipoisson regression models and negative binomial models, adopting dummy variables for local authority (a fairly inefficient way to tackle this problem) using glm() with family=quasipoisson and glm.nb(), respectively. The negative binomial regression appeared to be the better option so I then assessed the residuals for this model using moran.mc() from the spdep package (an approach recommended by a spatial colleague) and found that there was significant residual spatial autocorrelation in my model (value of 0.23 with p-value < 0.0001). After adopting a multilevel negative binomial model using glmmPQL(), with local authorities as the higher level, the residual spatial autocorrelation remained the same. I hoped to carry out some sort of spatial regression, such as CAR modelling, to attempt to remove the spatial autocorrelation to see if this has an impact on the association between area level deprivation and the number of facilities. I created a neighbours matrix for the datazones but as this is of dimension 6412x6412, I struggled to identify a modelling approach that was equipped to deal with this matrix. I was advised to try a variety of approaches using information on the eastings and northings of my datazone centroids, such as including polynomials of the eastings and northings or smoothing terms of the eastings and northings (using GAM models) in my single level or multilevel models and reassessing the residual spatial autocorrelation. I tried this and it did not reduce the spatial autocorrelation. I was then advised to include a spatial parameter in the model based on the neighbours matrix, as follows: S <- vector(length=6412) for(i in 1:6412) { S[i] <- sum(bigmatrix[i,]*log((walk$all_walk10+0.5)/walk$totpop)) } where bigmatrix is my 6412x6412 neighbours matrix of 0s and 1s and, walk is my dataset with all_walk10 as my response and totpop is the population size, used as the offset. (This is similar to an approach adopted in 'Bayesian quantile regression for count data with application to environmental epidemiology', D. Lee and T. Neocleous, Royal Statistical Society Journal Series C). This approach reduced the residual spatial autocorrelation to 0.16 in the multilevel model. However, the correlation remained significant. I have no further ideas about how to remove the residual spatial autocorrelation and would appreciate any suggestions. It may be the case that the removal of the correlation would have no impact on the results obtained and conclusions reached for the association between deprivation and the number of facilities. However, I was hoping to be able to explore the difference that the removal of the correlation would have on the results. Any suggestions would be greatly appreciated. Cheers, Karen [[alternative HTML version deleted]] _______________________________________________ R-sig-Geo mailing list R-sig-Geo@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo