Re: [R-sig-Geo] Local Moran p-values in Geoda, Python, and R

2015-03-12 Thread Roger Bivand

On Wed, 11 Mar 2015, William May wrote:


I sent this to Roger Bivand earlier. Posting it here for
posterity, and in case other people have more to add:

---
Hi Roger,

I'm taking a spatial econometrics class, and we've been making some
LISA maps. We noticed that the p-values from the spdep package seem to
be a lot different than the results from Geoda or from Python's PySAL
package.

I looked through the function definitions and noticed that Python, and
I think Geoda, use simulations to estimate the p-value, while your
localmoran function uses an equation.

I attached the graphs from Geoda and from R, and my R code and the
original dataset. [R code is added below.] Is this kind of
difference normal?


If you would put the figures on a website, we could compare them. One 
possible reason for differences is the alternative= argument to 
localmoran(). It's default is greater, but you could try two.sided to 
see whether this gives values closer to GeoDa. You don't say, by the way, 
which version of GeoDa you are using (since you mention Python, probably 
not legacy GeoDa), nor whether you can get the results out in tabular 
form. Are you using the same spatial weights - both can read and write GAL 
files. Are you aware that knearneigh() takes a longlat= argument, which 
must be used if the coordinates are not projected. The p-values in 
localmoran() are analytical ones from: Anselin, L. 1995. Local indicators 
of spatial association, Geographical Analysis, 27, 93--115. As you 
mention, in GeoDa they are by leave-one-out permutation bootstrapping.


Roger



Is there a reason to prefer spdep's results over the Geoda results (or
vice versa)?

Thanks for any help,

Will
---


Here's his response:

---
Please *do* write to the R-sig-geo list rather than to me directly -
others can answer your question as well, perhaps better, and in a more
timely way than I can. In addition, threads in the list can be
searched in the archives, so others can avoid the same problem later.

The only reliable test is localmoran.exact(), followed by
localmoran.sad(), for reasons given in the references in their help
pages.  If you test more than one observation, remember to adjust the
p-value for multiple comparisons. Sampling _i (all but i)
overrepresents non-neighbours of i for a test on i. The formulae used
are from the original paper and most often are similar to the
permutation bootstrap values.

Crucially, almost all use of local Moran's I (and local Geary and
Getis) fall foul of the autocorrelation being related to a
mis-specified mean model (omitted explanatory variables and/or omitted
adjustment for global autocorrelation). Look at the references to
localmoran.sad and localmoran.exact for more details, and at the
relevant parts of Waller  Gotway (2004), Schabenberger  Gotway
(2005), and in Bivand et al. (2008, 2013, ASDAR-book, ch. 9).

LISA maps are extremely misleading when the mean model is
mis-specified, and when the p-values are not adjusted for multiple
comparisons. They look simple, but because adjustment for multiple
comparisons is a subjective choice, you can almost choose the map you
want.

Roger
---


And here's my R code:

---
# make a LISA cluster map

library(maptools)
library(spdep)

setwd('/home/will/classes/polisci_610/1/')

# get the data
nat - readShapeSpatial(NAT.SHP, ID=FIPSNO)
nat.data-data.frame(nat)
attach(nat.data)

# nearest neighbors (nnb)
coords-coordinates(nat)
IDs-row.names(as(nat, data.frame))
nat_10nnb-knn2nb(knearneigh(coords, k=10), row.names=IDs)

# W identifies row standardized weights
nat_10nnb_w - nb2listw(nat_10nnb, style=W)

# can preset list_w object; if do this, remember to change if needed
list_w - nat_10nnb_w

# LISA values
lisa - localmoran(HR60, list_w, zero.policy = T)

# centers the variable of interest around its mean
cDV - HR60 - mean(HR60)

# centers the local Moran's around the mean
mI - lisa[, 1]
C_mI - mI - mean(mI) # but we don't want to center it! Only the sign
# matters.

quadrant - vector(mode=numeric,length=nrow(lisa))
quadrant[cDV0  mI0] - 1
quadrant[cDV 0  mI0] - 2 
quadrant[cDV0  mI0] - 3
quadrant[cDV 0  mI0] - 4

# set a statistical significance level for the local Moran's
signif - 0.05

# places non-significant Moran's in the category 5
quadrant[lisa[, 5] signif] - 5


# map
png(file=Lisa_map_R.png, width = 680, res = 100)
colors - c(red, blue, lightpink, skyblue2, rgb(.95, .95, .95))
par(mar=c(0,0,1,0)) # sets margin parameters for plot space
plot(nat, border=grey, col=colors[quadrant],
      main = LISA Cluster Map, 1960 Homicides)
legend(bottomright,legend=c(high-high,low-low,high-low,low-high),
  

Re: [R-sig-Geo] Local Moran p-values in Geoda, Python, and R

2015-03-12 Thread Roger Bivand

On Thu, 12 Mar 2015, William May wrote:


Here's the R LISA map:
http://willonrails.com/images/Lisa_map_R.png

And here's the Geoda version:
http://willonrails.com/images/Lisa_map_geoda.png

I'm using Geoda 1.6.6. Since we were getting inconsistent answers
between R and Geoda, the professor ran the analysis in Python to
get a third opinion. I don't have the Python version of the map
on me, but it's very similar to Geoda's.

I was able to export Geoda's local I and p-values. I put up a zip file
with all the data, R script, the maps, and the spatial weights files
for Geoda:
willonrails.com/images/lisa_spdep.zip

The local Moran's I in R and Geoda are very similar -- on average they
only differ by 0.000290044.

The p-values, on average, differ by 0.1808931.

In both I used 10 nearest neighbors for the weights. I did create the
weights separately in R and Geoda. The Geoda weights are in the zip
file. I'm new to coordinate projections, but here's the relevant part
of the R script:


OK, so time for Cartography 010: your input data are in unprojected, 
geographical coordinates, so you cannot measure the 10 nearest neighbours 
using Euclidean distance, but rather using Great Circle distance. This 
switches the neighbours of about 2000 of your 3000 counties. This isn't 
the cause of the discrepancy.


Doing hist(nat2$LISA_P) shows that it only extends from 0 to 0.5, so 
obviously is not the same as the values generated by spdep::localmoran (in 
the 0 to 1 range), or even similar. In fact, it is hard to see what has 
been done, and over 1500 counties are Pr  0.05. Perhaps the upper half 
has been folded down to the lower half?


So for GeoDa, we'd need to establish 1) are the probabilities two-sided or 
greater than (Ii  E(Ii)); 2) have probabilities  0.5 been folded back?


The functions in spdep use the greater alternative by default, and 
two.sided is seldom used, but perhaps should be. Using the 
p.adjust.method= argument (for example bonferroni or fdr) and 
accommodating only multiple comparisons among neighbours shifts most 
p-values into insignificant ranges.


I've been looking at:

library(spdep)
library(maptools)
nat - readShapeSpatial(NAT.SHP, ID=FIPSNO)
coords - coordinates(nat)
IDs - row.names(nat)
nat_10nnb_ll - knn2nb(knearneigh(coords, k=10, longlat=TRUE),
 row.names=IDs)
lisa_10nnb_ll - localmoran(nat$HR60, nb2listw(nat_10nnb_ll, style=W),
 alternative=two.sided, p.adjust.method=fdr)
hist(lisa_10nnb_ll[,5])

where only a very few counties have a significant level of local spatial 
autocorrelation at a 5% cutoff (295, still 10% of the total). Your 
GeoDa results gave 1566 of the folded p-values  5%, half of your 
counties.


nat$PR_Ii - lisa_10nnb_ll[,5]
spplot(nat, PR_Ii, at=c(0,0.05,0.95,1+.Machine$double.eps),
  col.regions=cm.colors(3), main=FDR-adjusted two.sided Ii)

shows a simplified map of the probabilities, dominated by counties not 
showing local spatial autocorrelation.


It will also be easier to move the R nb object to GeoDa as a GAL file, the 
ordering of the GWT file seems odd.


There are two main issues. One is that GeoDa uses permutations to generate 
probabilities - there is no agreed position on this, and the references I 
gave earlier suggest that permutating values for counties far from each 
other may create a false impression of local similarity, hence positive 
spatial autocorrelation. The second is the very high count of 
significant results with your data - what is a hot spot when half of the 
counties are hot?


One reason why permuting over all but i may be misleading is if many 
values of the variable of interest are in one tail - HR60 is highly 
skewed, with a minimum of 0, a maximum of 92, and an upper hinge of 6. So 
a local clique of neighbours with values in the 20s will seem very unusual 
in the permutation setting.


Finally, the global Moran's I for HR60 is highly significant, and this 
indicates anyway that your mean model is mis-specified.


Hope this helps,

Roger

PS. I regularly respond to overenthusiastic usage of measures of this kind 
when reviewing journal submissions. LISA are useful, but you do need to 
think carefully about what you are actually trying to do with them. 
Careful thematic mapping can be a very effective way of pointing up 
differences in levels of a variable:


library(classInt)
cI - classIntervals(nat$HR60, n=7, style=fisher)
fcI - findColours(cI, pal=rev(bpy.colors(7)))
plot(nat, col=fcI, border=transparent)
legend(bottomleft, fill=attr(fcI, palette), legend=names(attr(fcI,
 table)), bty=n)
title(main=HR60)

or using a ColorBrewer palette. Of course, I have no idea what HR60 
represents, and if it is a rate (homicide rate?), it should probably have 
been represented through funnel plot or empirical Bayes methods anyway.





# nearest neighbors (nnb)
coords - coordinates(nat)
IDs - row.names(as(nat, data.frame))
nat_10nnb - knn2nb(knearneigh(coords, k=10), row.names=IDs)

Re: [R-sig-Geo] gdistance: costDistance with barriers

2015-03-12 Thread Robert J. Hijmans
Karl,

You are using a default RasterLayer which has lon/lat coordinates. In
that case the earth is considered spherical (or similar), not a plane.
The maximum possible distance in longitude is 180 degrees, and the
distance between -120 and 120 is not 240 degrees, but 60+60= 120
degrees, .

To get the results I think you expected, use a planar CRS. For example
r - raster(nrows=18, ncols=36, crs='+proj=utm +zone=10')

Robert

On Thu, Mar 12, 2015 at 11:35 AM, karljarvis karljar...@gmail.com wrote:
 Hi all,
 I am seeing confusing behavior from the costDistance function in gdistance.
 In general, when cost is constant, cost distance increases linearly with
 actual distance. However, in this example, it is not doing that for the
 longest few distances. Am I missing something?
 Karl

 r - raster(nrows=18, ncols=36)
 r[] - 1
 t - transition(r,function(x) 1/mean(x),4)
 p - cbind(seq(-120,120,by=40),rep(0,7))
 costDistance(t,p)

 costDistance(t,p)
1  2  3  4  5  6
 2  4
 3  8  4
 4 12  8  4
 5 16 12  8  4
 6 16 16 12  8  4
 7 12 16 16 12  8  4



 --
 View this message in context: 
 http://r-sig-geo.2731867.n2.nabble.com/gdistance-costDistance-with-barriers-tp7587891.html
 Sent from the R-sig-geo mailing list archive at Nabble.com.

 ___
 R-sig-Geo mailing list
 R-sig-Geo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-geo

___
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo


Re: [R-sig-Geo] gdistance: costDistance with barriers

2015-03-12 Thread Stachelek, Joseph
Hi Karl,

I think the underlying shortest.paths algorithm is wrapping around the x 
dimension of the raster. I was able to fix by remove vertices in the underlying 
adjacency matrix.

library(gdistance)
r - raster(nrows=18, ncols=36)
r[] - 1
t - transition(r,function(x) 1/mean(x),4)
p - cbind(seq(-120,120,by=40),rep(0,7))
costDistance(t,p)


#my code
source(gdistance_source/internal-functions.R)
x-t
fromCoords-toCoords-p

fromCoords - .coordsToMatrix(fromCoords)
toCoords - .coordsToMatrix(toCoords)

fromCells - cellFromXY(x, fromCoords)
toCells - cellFromXY(x, toCoords)

##
costDist - matrix(NA, nrow=length(fromCoords[,1]),ncol=length(toCoords[,1]))
rownames(costDist) - rownames(fromCoords)
colnames(costDist) - rownames(toCoords)
y - transitionMatrix(x)
#if(isSymmetric(y)) {m - undirected} else{m - directed}
adjacencyGraph - graph.adjacency(y, mode=directed)
E(adjacencyGraph)$weight - 1/E(adjacencyGraph)$weight

uniqueFromCells - unique(fromCells)
uniqueToCells - unique(toCells)

extent(r)
removepnts - cbind(rep(180,181),seq(-90,90,by=40))
removecells - cellFromXY(x, removepnts)
adjacencyGraph-delete.vertices(adjacencyGraph,removecells)

shortest.paths(adjacencyGraph, v=uniqueFromCells, to=uniqueToCells, mode=out, 
algorithm=dijkstra)








-Original Message-
From: R-sig-Geo [mailto:r-sig-geo-boun...@r-project.org] On Behalf Of karljarvis
Sent: Thursday, March 12, 2015 2:36 PM
To: r-sig-geo@r-project.org
Subject: [R-sig-Geo] gdistance: costDistance with barriers

Hi all,
I am seeing confusing behavior from the costDistance function in gdistance.
In general, when cost is constant, cost distance increases linearly with
actual distance. However, in this example, it is not doing that for the
longest few distances. Am I missing something?
Karl

r - raster(nrows=18, ncols=36)
r[] - 1
t - transition(r,function(x) 1/mean(x),4)
p - cbind(seq(-120,120,by=40),rep(0,7))
costDistance(t,p)

 costDistance(t,p)
   1  2  3  4  5  6
2  4
3  8  4
4 12  8  4
5 16 12  8  4
6 16 16 12  8  4
7 12 16 16 12  8  4



--
View this message in context: 
http://r-sig-geo.2731867.n2.nabble.com/gdistance-costDistance-with-barriers-tp7587891.html
Sent from the R-sig-geo mailing list archive at Nabble.com.

___
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo


We value your opinion. Please take a few minutes to share your comments on the 
service you received from the District by clicking on this 
linkhttp://my.sfwmd.gov/portal/page/portal/pg_grp_surveysystem/survey%20ext?pid=1653.

___
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo


[R-sig-Geo] gdistance: costDistance with barriers

2015-03-12 Thread karljarvis
Hi all,
I am seeing confusing behavior from the costDistance function in gdistance.
In general, when cost is constant, cost distance increases linearly with
actual distance. However, in this example, it is not doing that for the
longest few distances. Am I missing something?
Karl

r - raster(nrows=18, ncols=36) 
r[] - 1
t - transition(r,function(x) 1/mean(x),4)
p - cbind(seq(-120,120,by=40),rep(0,7))
costDistance(t,p)

 costDistance(t,p)
   1  2  3  4  5  6
2  4   
3  8  4
4 12  8  4 
5 16 12  8  4  
6 16 16 12  8  4   
7 12 16 16 12  8  4



--
View this message in context: 
http://r-sig-geo.2731867.n2.nabble.com/gdistance-costDistance-with-barriers-tp7587891.html
Sent from the R-sig-geo mailing list archive at Nabble.com.

___
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo


Re: [R-sig-Geo] gdistance: costDistance with barriers

2015-03-12 Thread karljarvis
Apologies for the confusing subject line...



-
Karl Jarvis
--
View this message in context: 
http://r-sig-geo.2731867.n2.nabble.com/gdistance-costDistance-with-barriers-tp7587891p7587892.html
Sent from the R-sig-geo mailing list archive at Nabble.com.

___
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo


[R-sig-Geo] problem in weigths of the extract function

2015-03-12 Thread Daniel Varajão de Latorre
Dear list,

I recently upgraded the raster library and a script that used to run
perfectly gave me some mistakes. The problem was in the extract function:

r - raster(nrows=180, ncols=360, xmn=-180, xmx=180, ymn=-90, ymx=90)
r[] - 1
extract(x = r, y = world_map, weights=T, cellnumber=T)

I've tried different combinations of rasters and polygons and there were
mistakes with the weights output.

The solution I figured was to downgrade the package to the previous version
I was using (2.1-49), and everything is working again

Just thought this was something to share...

Best,
Daniel

[[alternative HTML version deleted]]

___
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo


Re: [R-sig-Geo] gdistance: costDistance with barriers

2015-03-12 Thread Jacob van Etten via R-sig-Geo
Robert is right. This example visualizes what is happening.
library(gdistance)r - raster(nrows=18, ncols=36) r[] - 1x - 
transition(r,function(x) 1/mean(x),4)origin - c(-120,0)goal - c(120,0)sl - 
shortestPath(x, origin, goal)plot(raster(sl)) 

 On Thursday, 12 March 2015, 14:50, Robert J. Hijmans r.hijm...@gmail.com 
wrote:
   

 Karl,

You are using a default RasterLayer which has lon/lat coordinates. In
that case the earth is considered spherical (or similar), not a plane.
The maximum possible distance in longitude is 180 degrees, and the
distance between -120 and 120 is not 240 degrees, but 60+60= 120
degrees, .

To get the results I think you expected, use a planar CRS. For example
r - raster(nrows=18, ncols=36, crs='+proj=utm +zone=10')

Robert

On Thu, Mar 12, 2015 at 11:35 AM, karljarvis karljar...@gmail.com wrote:
 Hi all,
 I am seeing confusing behavior from the costDistance function in gdistance.
 In general, when cost is constant, cost distance increases linearly with
 actual distance. However, in this example, it is not doing that for the
 longest few distances. Am I missing something?
 Karl

 r - raster(nrows=18, ncols=36)
 r[] - 1
 t - transition(r,function(x) 1/mean(x),4)
 p - cbind(seq(-120,120,by=40),rep(0,7))
 costDistance(t,p)

 costDistance(t,p)
    1  2  3  4  5  6
 2  4
 3  8  4
 4 12  8  4
 5 16 12  8  4
 6 16 16 12  8  4
 7 12 16 16 12  8  4



 --
 View this message in context: 
 http://r-sig-geo.2731867.n2.nabble.com/gdistance-costDistance-with-barriers-tp7587891.html
 Sent from the R-sig-geo mailing list archive at Nabble.com.

 ___
 R-sig-Geo mailing list
 R-sig-Geo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-geo

___
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo


   
[[alternative HTML version deleted]]

___
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo


Re: [R-sig-Geo] gdistance: costDistance with barriers

2015-03-12 Thread Karl Jarvis
Great! Thank you all. 

 On Mar 12, 2015, at 4:35 PM, Jacob van Etten jacobvanet...@yahoo.com wrote:
 
 Robert is right. This example visualizes what is happening.
 
 library(gdistance)
 r - raster(nrows=18, ncols=36) 
 r[] - 1
 x - transition(r,function(x) 1/mean(x),4)
 origin - c(-120,0)
 goal - c(120,0)
 sl - shortestPath(x, origin, goal)
 plot(raster(sl))
 
 
 On Thursday, 12 March 2015, 14:50, Robert J. Hijmans r.hijm...@gmail.com 
 wrote:
 
 
 Karl,
 
 You are using a default RasterLayer which has lon/lat coordinates. In
 that case the earth is considered spherical (or similar), not a plane.
 The maximum possible distance in longitude is 180 degrees, and the
 distance between -120 and 120 is not 240 degrees, but 60+60= 120
 degrees, .
 
 To get the results I think you expected, use a planar CRS. For example
 r - raster(nrows=18, ncols=36, crs='+proj=utm +zone=10')
 
 Robert
 
 On Thu, Mar 12, 2015 at 11:35 AM, karljarvis karljar...@gmail.com 
 mailto:karljar...@gmail.com wrote:
  Hi all,
  I am seeing confusing behavior from the costDistance function in gdistance.
  In general, when cost is constant, cost distance increases linearly with
  actual distance. However, in this example, it is not doing that for the
  longest few distances. Am I missing something?
  Karl
 
  r - raster(nrows=18, ncols=36)
  r[] - 1
  t - transition(r,function(x) 1/mean(x),4)
  p - cbind(seq(-120,120,by=40),rep(0,7))
  costDistance(t,p)
 
  costDistance(t,p)
 1  2  3  4  5  6
  2  4
  3  8  4
  4 12  8  4
  5 16 12  8  4
  6 16 16 12  8  4
  7 12 16 16 12  8  4
 
 
 
  --
  View this message in context: 
  http://r-sig-geo.2731867.n2.nabble.com/gdistance-costDistance-with-barriers-tp7587891.html
   
  http://r-sig-geo.2731867.n2.nabble.com/gdistance-costDistance-with-barriers-tp7587891.html
  Sent from the R-sig-geo mailing list archive at Nabble.com.
 
  ___
  R-sig-Geo mailing list
  R-sig-Geo@r-project.org mailto:R-sig-Geo@r-project.org
  https://stat.ethz.ch/mailman/listinfo/r-sig-geo 
  https://stat.ethz.ch/mailman/listinfo/r-sig-geo
 
 
 ___
 R-sig-Geo mailing list
 R-sig-Geo@r-project.org mailto:R-sig-Geo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-geo 
 https://stat.ethz.ch/mailman/listinfo/r-sig-geo
 
 


[[alternative HTML version deleted]]

___
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo