Re: [R-sig-Geo] Local Moran p-values in Geoda, Python, and R
On Wed, 11 Mar 2015, William May wrote: I sent this to Roger Bivand earlier. Posting it here for posterity, and in case other people have more to add: --- Hi Roger, I'm taking a spatial econometrics class, and we've been making some LISA maps. We noticed that the p-values from the spdep package seem to be a lot different than the results from Geoda or from Python's PySAL package. I looked through the function definitions and noticed that Python, and I think Geoda, use simulations to estimate the p-value, while your localmoran function uses an equation. I attached the graphs from Geoda and from R, and my R code and the original dataset. [R code is added below.] Is this kind of difference normal? If you would put the figures on a website, we could compare them. One possible reason for differences is the alternative= argument to localmoran(). It's default is greater, but you could try two.sided to see whether this gives values closer to GeoDa. You don't say, by the way, which version of GeoDa you are using (since you mention Python, probably not legacy GeoDa), nor whether you can get the results out in tabular form. Are you using the same spatial weights - both can read and write GAL files. Are you aware that knearneigh() takes a longlat= argument, which must be used if the coordinates are not projected. The p-values in localmoran() are analytical ones from: Anselin, L. 1995. Local indicators of spatial association, Geographical Analysis, 27, 93--115. As you mention, in GeoDa they are by leave-one-out permutation bootstrapping. Roger Is there a reason to prefer spdep's results over the Geoda results (or vice versa)? Thanks for any help, Will --- Here's his response: --- Please *do* write to the R-sig-geo list rather than to me directly - others can answer your question as well, perhaps better, and in a more timely way than I can. In addition, threads in the list can be searched in the archives, so others can avoid the same problem later. The only reliable test is localmoran.exact(), followed by localmoran.sad(), for reasons given in the references in their help pages. If you test more than one observation, remember to adjust the p-value for multiple comparisons. Sampling _i (all but i) overrepresents non-neighbours of i for a test on i. The formulae used are from the original paper and most often are similar to the permutation bootstrap values. Crucially, almost all use of local Moran's I (and local Geary and Getis) fall foul of the autocorrelation being related to a mis-specified mean model (omitted explanatory variables and/or omitted adjustment for global autocorrelation). Look at the references to localmoran.sad and localmoran.exact for more details, and at the relevant parts of Waller Gotway (2004), Schabenberger Gotway (2005), and in Bivand et al. (2008, 2013, ASDAR-book, ch. 9). LISA maps are extremely misleading when the mean model is mis-specified, and when the p-values are not adjusted for multiple comparisons. They look simple, but because adjustment for multiple comparisons is a subjective choice, you can almost choose the map you want. Roger --- And here's my R code: --- # make a LISA cluster map library(maptools) library(spdep) setwd('/home/will/classes/polisci_610/1/') # get the data nat - readShapeSpatial(NAT.SHP, ID=FIPSNO) nat.data-data.frame(nat) attach(nat.data) # nearest neighbors (nnb) coords-coordinates(nat) IDs-row.names(as(nat, data.frame)) nat_10nnb-knn2nb(knearneigh(coords, k=10), row.names=IDs) # W identifies row standardized weights nat_10nnb_w - nb2listw(nat_10nnb, style=W) # can preset list_w object; if do this, remember to change if needed list_w - nat_10nnb_w # LISA values lisa - localmoran(HR60, list_w, zero.policy = T) # centers the variable of interest around its mean cDV - HR60 - mean(HR60) # centers the local Moran's around the mean mI - lisa[, 1] C_mI - mI - mean(mI) # but we don't want to center it! Only the sign # matters. quadrant - vector(mode=numeric,length=nrow(lisa)) quadrant[cDV0 mI0] - 1 quadrant[cDV 0 mI0] - 2 quadrant[cDV0 mI0] - 3 quadrant[cDV 0 mI0] - 4 # set a statistical significance level for the local Moran's signif - 0.05 # places non-significant Moran's in the category 5 quadrant[lisa[, 5] signif] - 5 # map png(file=Lisa_map_R.png, width = 680, res = 100) colors - c(red, blue, lightpink, skyblue2, rgb(.95, .95, .95)) par(mar=c(0,0,1,0)) # sets margin parameters for plot space plot(nat, border=grey, col=colors[quadrant], main = LISA Cluster Map, 1960 Homicides) legend(bottomright,legend=c(high-high,low-low,high-low,low-high),
Re: [R-sig-Geo] Local Moran p-values in Geoda, Python, and R
On Thu, 12 Mar 2015, William May wrote: Here's the R LISA map: http://willonrails.com/images/Lisa_map_R.png And here's the Geoda version: http://willonrails.com/images/Lisa_map_geoda.png I'm using Geoda 1.6.6. Since we were getting inconsistent answers between R and Geoda, the professor ran the analysis in Python to get a third opinion. I don't have the Python version of the map on me, but it's very similar to Geoda's. I was able to export Geoda's local I and p-values. I put up a zip file with all the data, R script, the maps, and the spatial weights files for Geoda: willonrails.com/images/lisa_spdep.zip The local Moran's I in R and Geoda are very similar -- on average they only differ by 0.000290044. The p-values, on average, differ by 0.1808931. In both I used 10 nearest neighbors for the weights. I did create the weights separately in R and Geoda. The Geoda weights are in the zip file. I'm new to coordinate projections, but here's the relevant part of the R script: OK, so time for Cartography 010: your input data are in unprojected, geographical coordinates, so you cannot measure the 10 nearest neighbours using Euclidean distance, but rather using Great Circle distance. This switches the neighbours of about 2000 of your 3000 counties. This isn't the cause of the discrepancy. Doing hist(nat2$LISA_P) shows that it only extends from 0 to 0.5, so obviously is not the same as the values generated by spdep::localmoran (in the 0 to 1 range), or even similar. In fact, it is hard to see what has been done, and over 1500 counties are Pr 0.05. Perhaps the upper half has been folded down to the lower half? So for GeoDa, we'd need to establish 1) are the probabilities two-sided or greater than (Ii E(Ii)); 2) have probabilities 0.5 been folded back? The functions in spdep use the greater alternative by default, and two.sided is seldom used, but perhaps should be. Using the p.adjust.method= argument (for example bonferroni or fdr) and accommodating only multiple comparisons among neighbours shifts most p-values into insignificant ranges. I've been looking at: library(spdep) library(maptools) nat - readShapeSpatial(NAT.SHP, ID=FIPSNO) coords - coordinates(nat) IDs - row.names(nat) nat_10nnb_ll - knn2nb(knearneigh(coords, k=10, longlat=TRUE), row.names=IDs) lisa_10nnb_ll - localmoran(nat$HR60, nb2listw(nat_10nnb_ll, style=W), alternative=two.sided, p.adjust.method=fdr) hist(lisa_10nnb_ll[,5]) where only a very few counties have a significant level of local spatial autocorrelation at a 5% cutoff (295, still 10% of the total). Your GeoDa results gave 1566 of the folded p-values 5%, half of your counties. nat$PR_Ii - lisa_10nnb_ll[,5] spplot(nat, PR_Ii, at=c(0,0.05,0.95,1+.Machine$double.eps), col.regions=cm.colors(3), main=FDR-adjusted two.sided Ii) shows a simplified map of the probabilities, dominated by counties not showing local spatial autocorrelation. It will also be easier to move the R nb object to GeoDa as a GAL file, the ordering of the GWT file seems odd. There are two main issues. One is that GeoDa uses permutations to generate probabilities - there is no agreed position on this, and the references I gave earlier suggest that permutating values for counties far from each other may create a false impression of local similarity, hence positive spatial autocorrelation. The second is the very high count of significant results with your data - what is a hot spot when half of the counties are hot? One reason why permuting over all but i may be misleading is if many values of the variable of interest are in one tail - HR60 is highly skewed, with a minimum of 0, a maximum of 92, and an upper hinge of 6. So a local clique of neighbours with values in the 20s will seem very unusual in the permutation setting. Finally, the global Moran's I for HR60 is highly significant, and this indicates anyway that your mean model is mis-specified. Hope this helps, Roger PS. I regularly respond to overenthusiastic usage of measures of this kind when reviewing journal submissions. LISA are useful, but you do need to think carefully about what you are actually trying to do with them. Careful thematic mapping can be a very effective way of pointing up differences in levels of a variable: library(classInt) cI - classIntervals(nat$HR60, n=7, style=fisher) fcI - findColours(cI, pal=rev(bpy.colors(7))) plot(nat, col=fcI, border=transparent) legend(bottomleft, fill=attr(fcI, palette), legend=names(attr(fcI, table)), bty=n) title(main=HR60) or using a ColorBrewer palette. Of course, I have no idea what HR60 represents, and if it is a rate (homicide rate?), it should probably have been represented through funnel plot or empirical Bayes methods anyway. # nearest neighbors (nnb) coords - coordinates(nat) IDs - row.names(as(nat, data.frame)) nat_10nnb - knn2nb(knearneigh(coords, k=10), row.names=IDs)
Re: [R-sig-Geo] gdistance: costDistance with barriers
Karl, You are using a default RasterLayer which has lon/lat coordinates. In that case the earth is considered spherical (or similar), not a plane. The maximum possible distance in longitude is 180 degrees, and the distance between -120 and 120 is not 240 degrees, but 60+60= 120 degrees, . To get the results I think you expected, use a planar CRS. For example r - raster(nrows=18, ncols=36, crs='+proj=utm +zone=10') Robert On Thu, Mar 12, 2015 at 11:35 AM, karljarvis karljar...@gmail.com wrote: Hi all, I am seeing confusing behavior from the costDistance function in gdistance. In general, when cost is constant, cost distance increases linearly with actual distance. However, in this example, it is not doing that for the longest few distances. Am I missing something? Karl r - raster(nrows=18, ncols=36) r[] - 1 t - transition(r,function(x) 1/mean(x),4) p - cbind(seq(-120,120,by=40),rep(0,7)) costDistance(t,p) costDistance(t,p) 1 2 3 4 5 6 2 4 3 8 4 4 12 8 4 5 16 12 8 4 6 16 16 12 8 4 7 12 16 16 12 8 4 -- View this message in context: http://r-sig-geo.2731867.n2.nabble.com/gdistance-costDistance-with-barriers-tp7587891.html Sent from the R-sig-geo mailing list archive at Nabble.com. ___ R-sig-Geo mailing list R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo ___ R-sig-Geo mailing list R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Re: [R-sig-Geo] gdistance: costDistance with barriers
Hi Karl, I think the underlying shortest.paths algorithm is wrapping around the x dimension of the raster. I was able to fix by remove vertices in the underlying adjacency matrix. library(gdistance) r - raster(nrows=18, ncols=36) r[] - 1 t - transition(r,function(x) 1/mean(x),4) p - cbind(seq(-120,120,by=40),rep(0,7)) costDistance(t,p) #my code source(gdistance_source/internal-functions.R) x-t fromCoords-toCoords-p fromCoords - .coordsToMatrix(fromCoords) toCoords - .coordsToMatrix(toCoords) fromCells - cellFromXY(x, fromCoords) toCells - cellFromXY(x, toCoords) ## costDist - matrix(NA, nrow=length(fromCoords[,1]),ncol=length(toCoords[,1])) rownames(costDist) - rownames(fromCoords) colnames(costDist) - rownames(toCoords) y - transitionMatrix(x) #if(isSymmetric(y)) {m - undirected} else{m - directed} adjacencyGraph - graph.adjacency(y, mode=directed) E(adjacencyGraph)$weight - 1/E(adjacencyGraph)$weight uniqueFromCells - unique(fromCells) uniqueToCells - unique(toCells) extent(r) removepnts - cbind(rep(180,181),seq(-90,90,by=40)) removecells - cellFromXY(x, removepnts) adjacencyGraph-delete.vertices(adjacencyGraph,removecells) shortest.paths(adjacencyGraph, v=uniqueFromCells, to=uniqueToCells, mode=out, algorithm=dijkstra) -Original Message- From: R-sig-Geo [mailto:r-sig-geo-boun...@r-project.org] On Behalf Of karljarvis Sent: Thursday, March 12, 2015 2:36 PM To: r-sig-geo@r-project.org Subject: [R-sig-Geo] gdistance: costDistance with barriers Hi all, I am seeing confusing behavior from the costDistance function in gdistance. In general, when cost is constant, cost distance increases linearly with actual distance. However, in this example, it is not doing that for the longest few distances. Am I missing something? Karl r - raster(nrows=18, ncols=36) r[] - 1 t - transition(r,function(x) 1/mean(x),4) p - cbind(seq(-120,120,by=40),rep(0,7)) costDistance(t,p) costDistance(t,p) 1 2 3 4 5 6 2 4 3 8 4 4 12 8 4 5 16 12 8 4 6 16 16 12 8 4 7 12 16 16 12 8 4 -- View this message in context: http://r-sig-geo.2731867.n2.nabble.com/gdistance-costDistance-with-barriers-tp7587891.html Sent from the R-sig-geo mailing list archive at Nabble.com. ___ R-sig-Geo mailing list R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo We value your opinion. Please take a few minutes to share your comments on the service you received from the District by clicking on this linkhttp://my.sfwmd.gov/portal/page/portal/pg_grp_surveysystem/survey%20ext?pid=1653. ___ R-sig-Geo mailing list R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
[R-sig-Geo] gdistance: costDistance with barriers
Hi all, I am seeing confusing behavior from the costDistance function in gdistance. In general, when cost is constant, cost distance increases linearly with actual distance. However, in this example, it is not doing that for the longest few distances. Am I missing something? Karl r - raster(nrows=18, ncols=36) r[] - 1 t - transition(r,function(x) 1/mean(x),4) p - cbind(seq(-120,120,by=40),rep(0,7)) costDistance(t,p) costDistance(t,p) 1 2 3 4 5 6 2 4 3 8 4 4 12 8 4 5 16 12 8 4 6 16 16 12 8 4 7 12 16 16 12 8 4 -- View this message in context: http://r-sig-geo.2731867.n2.nabble.com/gdistance-costDistance-with-barriers-tp7587891.html Sent from the R-sig-geo mailing list archive at Nabble.com. ___ R-sig-Geo mailing list R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Re: [R-sig-Geo] gdistance: costDistance with barriers
Apologies for the confusing subject line... - Karl Jarvis -- View this message in context: http://r-sig-geo.2731867.n2.nabble.com/gdistance-costDistance-with-barriers-tp7587891p7587892.html Sent from the R-sig-geo mailing list archive at Nabble.com. ___ R-sig-Geo mailing list R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
[R-sig-Geo] problem in weigths of the extract function
Dear list, I recently upgraded the raster library and a script that used to run perfectly gave me some mistakes. The problem was in the extract function: r - raster(nrows=180, ncols=360, xmn=-180, xmx=180, ymn=-90, ymx=90) r[] - 1 extract(x = r, y = world_map, weights=T, cellnumber=T) I've tried different combinations of rasters and polygons and there were mistakes with the weights output. The solution I figured was to downgrade the package to the previous version I was using (2.1-49), and everything is working again Just thought this was something to share... Best, Daniel [[alternative HTML version deleted]] ___ R-sig-Geo mailing list R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Re: [R-sig-Geo] gdistance: costDistance with barriers
Robert is right. This example visualizes what is happening. library(gdistance)r - raster(nrows=18, ncols=36) r[] - 1x - transition(r,function(x) 1/mean(x),4)origin - c(-120,0)goal - c(120,0)sl - shortestPath(x, origin, goal)plot(raster(sl)) On Thursday, 12 March 2015, 14:50, Robert J. Hijmans r.hijm...@gmail.com wrote: Karl, You are using a default RasterLayer which has lon/lat coordinates. In that case the earth is considered spherical (or similar), not a plane. The maximum possible distance in longitude is 180 degrees, and the distance between -120 and 120 is not 240 degrees, but 60+60= 120 degrees, . To get the results I think you expected, use a planar CRS. For example r - raster(nrows=18, ncols=36, crs='+proj=utm +zone=10') Robert On Thu, Mar 12, 2015 at 11:35 AM, karljarvis karljar...@gmail.com wrote: Hi all, I am seeing confusing behavior from the costDistance function in gdistance. In general, when cost is constant, cost distance increases linearly with actual distance. However, in this example, it is not doing that for the longest few distances. Am I missing something? Karl r - raster(nrows=18, ncols=36) r[] - 1 t - transition(r,function(x) 1/mean(x),4) p - cbind(seq(-120,120,by=40),rep(0,7)) costDistance(t,p) costDistance(t,p) 1 2 3 4 5 6 2 4 3 8 4 4 12 8 4 5 16 12 8 4 6 16 16 12 8 4 7 12 16 16 12 8 4 -- View this message in context: http://r-sig-geo.2731867.n2.nabble.com/gdistance-costDistance-with-barriers-tp7587891.html Sent from the R-sig-geo mailing list archive at Nabble.com. ___ R-sig-Geo mailing list R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo ___ R-sig-Geo mailing list R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo [[alternative HTML version deleted]] ___ R-sig-Geo mailing list R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Re: [R-sig-Geo] gdistance: costDistance with barriers
Great! Thank you all. On Mar 12, 2015, at 4:35 PM, Jacob van Etten jacobvanet...@yahoo.com wrote: Robert is right. This example visualizes what is happening. library(gdistance) r - raster(nrows=18, ncols=36) r[] - 1 x - transition(r,function(x) 1/mean(x),4) origin - c(-120,0) goal - c(120,0) sl - shortestPath(x, origin, goal) plot(raster(sl)) On Thursday, 12 March 2015, 14:50, Robert J. Hijmans r.hijm...@gmail.com wrote: Karl, You are using a default RasterLayer which has lon/lat coordinates. In that case the earth is considered spherical (or similar), not a plane. The maximum possible distance in longitude is 180 degrees, and the distance between -120 and 120 is not 240 degrees, but 60+60= 120 degrees, . To get the results I think you expected, use a planar CRS. For example r - raster(nrows=18, ncols=36, crs='+proj=utm +zone=10') Robert On Thu, Mar 12, 2015 at 11:35 AM, karljarvis karljar...@gmail.com mailto:karljar...@gmail.com wrote: Hi all, I am seeing confusing behavior from the costDistance function in gdistance. In general, when cost is constant, cost distance increases linearly with actual distance. However, in this example, it is not doing that for the longest few distances. Am I missing something? Karl r - raster(nrows=18, ncols=36) r[] - 1 t - transition(r,function(x) 1/mean(x),4) p - cbind(seq(-120,120,by=40),rep(0,7)) costDistance(t,p) costDistance(t,p) 1 2 3 4 5 6 2 4 3 8 4 4 12 8 4 5 16 12 8 4 6 16 16 12 8 4 7 12 16 16 12 8 4 -- View this message in context: http://r-sig-geo.2731867.n2.nabble.com/gdistance-costDistance-with-barriers-tp7587891.html http://r-sig-geo.2731867.n2.nabble.com/gdistance-costDistance-with-barriers-tp7587891.html Sent from the R-sig-geo mailing list archive at Nabble.com. ___ R-sig-Geo mailing list R-sig-Geo@r-project.org mailto:R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo https://stat.ethz.ch/mailman/listinfo/r-sig-geo ___ R-sig-Geo mailing list R-sig-Geo@r-project.org mailto:R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo https://stat.ethz.ch/mailman/listinfo/r-sig-geo [[alternative HTML version deleted]] ___ R-sig-Geo mailing list R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo