[R-sig-Geo] How to objectively subset cities by population

2017-07-26 Thread Thiago V. dos Santos via R-sig-Geo
Dear all,

I have temperature records of nearly 1200 locations in southern Brazil.

I am writing a shiny app that will show an interactive map with the locations 
plotted as circles, where the user can click a location to see its temperature 
time series.

However, if I show all the locations in the map, it will look really bad, too 
cramped.

Therefore, in an attempt to make the map look a bit cleaner, I am trying to 
think of an objective way to subset the locations. My initial approach would be 
to show only the "largest" locations, i.e. the ones with a population above a 
certain threshold.

The problem is: the distribution of the population is so positively skewed that 
I am having a hard time determining the optimal cutoff point.

Does anybody here know any tool or method, possibly spatial, that can assist me 
with this analysis?

These are the locations I am working with:

#---
# Download and summarize
locs <- 
read.csv("https://www.dropbox.com/s/ykdd8x1mlc76klt/locations.csv?raw=1;)
hist(locs$Population)
summary(locs$Population)

# Convert to spatial points and plot
require(sp)
coordinates(locs) <- cbind(locs$Lon , locs$Lat)
plot(locs)
bubble(locs,"Population")
#---

Thanks in advance,
 -- Thiago V. dos Santos

PhD student
Land and Atmospheric Science
University of Minnesota
[[alternative HTML version deleted]]

___
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo


[R-sig-Geo] Finding overlapping polygons

2017-07-26 Thread Luciano La Sala via R-sig-Geo
Dear everyone,

I am studying the potential spatial distribution between vertebrate 
invasive species and native ones (threatened) in the souther cone of 
South America. The invasive species' distribution was grossly 
approximated using an ecoregional approach, which is not relevant here. 
The analysis is based on shapefiles for all terrestrial mammals and 
birds of the world obtained from BirdLife International 
(http://www.biodiversityinfo.org/spcdownload/r5h8a1/) and IUCN 
(http://www.iucnredlist.org/technical-documents/spatial-data) public 
domains.

Our goal: using a two-step approach, identify native species (polygons) 
whose distributionoverlap -totally or partially- with that of the invasive.

Our problem: Some species (with small distribution) which are known to 
occur within the invasive distribution range are being left out of the 
final selection (checkedin QGIS). I suspect the function "isin" in the 
fastshp package (2nd loop) fails to select some of the overlapping 
polygons in Step 2, or some other problem. If a reproducible example is 
absolutely required, I can upload to dropbox.

For the sake of simplicity, I will present present a case example 
including one invasive species (e.g. wild boar) and all terrestrial 
mammals.

# Analysis of Wild Boar distributional overlap with IUCN mammal species 
install.packages("rgdal") install.packages("sp") 
install.packages("ggplot2") install.packages("rgeos") 
install.packages("raster") install.packages("fastshp", repos = 
"http://rforge.net;, type = "source")install.packages("cleangeo")

mypath <- ("/Users/ ... /Terrestrial mammals") files <- 
list.files(path=mypath, pattern = "\\.shp$", full.names=T) 
length(files)  # 5,465 mammal species # Load study area

study_area <- read.shp("C:/Users/.../Study_area.shp", format="table", 
close=FALSE) salon <- range(study_area$x, na.rm=T) salat <- 
range(study_area$y, na.rm=T) bounding <- c(salon, salat) box <- 
data.frame(salon, salat) colnames(box) <- c("Longitud","Latitude") box

# Step 1. Run all native IUCN species' distribution through a loop and 
select those which overlap with the invasive's bounding box distribution

keep <- rep(NA, length(files)) length(keep)  # 5,465 for(i in 
1:length(files)){   r <- read.shp(files[i], format="table")   rlon <- 
range(r$x)   rlat <- range(r$y) ## Is out of the bounding box? test.lat 
<- (min(rlat) > max(salat)) | (max(rlat) < min(salat))   test.lon <- 
(min(rlon) > max(salon)) | (max(rlon) < min(salon)) keep[i] <- 
!(test.lon | test.lat) } keep <- which(keep==1) # Index of shapefiles 
that passed the first filter length(keep) # 773 species passed 1st 
filter # Step 2. Load invasive species distribution area

study_area <- readOGR("/Users/.../Study area","Study_area") study_area 
<- spTransform(study_area, CRS("+init=epsg:32721")) # To UTM # Get a 
report of geometry validity & issues for a sp spatial object report <- 
clgeo_CollectionReport(study_area) summary <- 
clgeo_SummaryReport(report) summary # Remove holes and simplify geometry 
of study area for computational efficiency outer <- 
Filter(function(x){x@ringDir==1}, study_area@polygons[[1]]@Polygons) # 
PREGUNTAR study_area <- SpatialPolygons(list(Polygons(outer, ID=1))) 
study_area <- disaggregate(study_area) areas <- gArea(study_area, 
byid=TRUE) #Remove polygons smaller than 3000 km2 study_area <- 
study_area[which(areas > 3E9),] study_area <- gSimplify(study_area, 
2, topologyPreserve=TRUE) # Simplify geometry 
proj4string(study_area) <- CRS("+init=epsg:32721") study_area <- 
spTransform(study_area, CRS("+proj=longlat +datum=WGS84 +no_defs 
+ellps=WGS84 +towgs84=0,0,0")) xy <- fortify(study_area) # 
SpatialPolygons to data frame keep2 <- rep(NA,length(keep)) # 773 sp. 
that passed first filter for(i in 1:length(keep)){   r <- 
read.shp(files[keep[i]], format="polygon")   isin <- any(inside(r, 
x=xy$long, y=xy$lat), na.rm=T)   keep2[i] <- isin } keep2 <- 
which(keep2) # Index of shapefiles that passed the second sort 
length(keep2)  # 532 species that passed the second filter species <- 
list.files(path=mypath, pattern = "\\.shp$",full.names=F) species <- 
sub(".shp", "", species) species <- species[keep[keep2]]



---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

[[alternative HTML version deleted]]


.

___
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo


[R-sig-Geo] Finding overlapping polygons

2017-07-26 Thread Luciano La Sala via R-sig-Geo
Dear everyone,

I am studying the potential spatal distribution between vertebrate 
invasive species and native, threatened ones in the souther cone of 
South America. The invasive species' distribution was grossly 
approximated using an ecoregion approach, which is not relevant here. 
Then, I downloaded shapefiles for all terrestrial mammals and birds of 
the world from BirdLife International 
(http://www.biodiversityinfo.org/spcdownload/r5h8a1/) and IUCN 
(http://www.iucnredlist.org/technical-documents/spatial-data) public 
domains.

Our goal: using a two-step approach, identify native species (polygons) 
whose distributionoverlap -totally or partially- with that of the invasive.

Our problem: Some species (with small distribution) which are known to 
occur within the invasive distribution range are being left out of the 
final selection (checkedin QGIS). I suspect the function "isin" in the 
fastshp package (2nd loop) fails to select some of the overlapping 
polygons in Step 2, or some other problem. If a reproducible example is 
absolutely required, I can upload to dropbox.

For the sake of simplicity, I will present present a case example 
including one invasive species (e.g. wild boar) and all terrestrial 
mammals.

# Analysis of Wild Boar distributional overlap with IUCN mammal species 
install.packages("rgdal") install.packages("sp") 
install.packages("ggplot2") install.packages("rgeos") 
install.packages("raster") install.packages("fastshp", repos = 
"http://rforge.net;, type = "source")install.packages("cleangeo")

mypath <- ("/Users/ ... /Terrestrial mammals") files <- 
list.files(path=mypath, pattern = "\\.shp$", full.names=T) 
length(files)  # 5,465 mammal species # Load study area

study_area <- read.shp("C:/Users/.../Study_area.shp", format="table", 
close=FALSE) salon <- range(study_area$x, na.rm=T) salat <- 
range(study_area$y, na.rm=T) bounding <- c(salon, salat) box <- 
data.frame(salon, salat) colnames(box) <- c("Longitud","Latitude") box

# Step 1. Run all native IUCN species' distribution through a loop and 
select those which overlap with the invasive's bounding box distribution

keep <- rep(NA, length(files)) length(keep)  # 5,465 for(i in 
1:length(files)){   r <- read.shp(files[i], format="table")   rlon <- 
range(r$x)   rlat <- range(r$y) ## Is out of the bounding box? test.lat 
<- (min(rlat) > max(salat)) | (max(rlat) < min(salat))   test.lon <- 
(min(rlon) > max(salon)) | (max(rlon) < min(salon)) keep[i] <- 
!(test.lon | test.lat) } keep <- which(keep==1) # Index of shapefiles 
that passed the first filter length(keep) # 773 species passed 1st 
filter # Step 2. Load invasive species distribution area

study_area <- readOGR("/Users/.../Study area","Study_area") study_area 
<- spTransform(study_area, CRS("+init=epsg:32721")) # To UTM # Get a 
report of geometry validity & issues for a sp spatial object report <- 
clgeo_CollectionReport(study_area) summary <- 
clgeo_SummaryReport(report) summary # Remove holes and simplify geometry 
of study area for computational efficiency outer <- 
Filter(function(x){x@ringDir==1}, study_area@polygons[[1]]@Polygons) # 
PREGUNTAR study_area <- SpatialPolygons(list(Polygons(outer, ID=1))) 
study_area <- disaggregate(study_area) areas <- gArea(study_area, 
byid=TRUE) #Remove polygons smaller than 3000 km2 study_area <- 
study_area[which(areas > 3E9),] study_area <- gSimplify(study_area, 
2, topologyPreserve=TRUE) # Simplify geometry 
proj4string(study_area) <- CRS("+init=epsg:32721") study_area <- 
spTransform(study_area, CRS("+proj=longlat +datum=WGS84 +no_defs 
+ellps=WGS84 +towgs84=0,0,0")) xy <- fortify(study_area) # 
SpatialPolygons to data frame keep2 <- rep(NA,length(keep)) # 773 sp. 
that passed first filter for(i in 1:length(keep)){   r <- 
read.shp(files[keep[i]], format="polygon")   isin <- any(inside(r, 
x=xy$long, y=xy$lat), na.rm=T)   keep2[i] <- isin } keep2 <- 
which(keep2) # Index of shapefiles that passed the second sort 
length(keep2)  # 532 species that passed the second filter species <- 
list.files(path=mypath, pattern = "\\.shp$",full.names=F) species <- 
sub(".shp", "", species) species <- species[keep[keep2]]



---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

[[alternative HTML version deleted]]

___
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo


Re: [R-sig-Geo] bivariate spatial correlation in R

2017-07-26 Thread Roger Bivand

On Wed, 26 Jul 2017, Rafael Pereira wrote:


Roger,

This example was provided only for the sake or making the code easily
reproducible for others and I'm more interested in how the bi-variate Moran
could be implemented in R, but your comments are very much welcomed and
I've made changes to the question.

My actual case study looks at bi-variate spatial correlation between (a)
average household income per capita and (b) proportion of jobs in the city
that are accessible under 60 minutes by transit. I don't think I could use
rates in this case but I will normalize the variables using
scale(data$variable).


Please provide a reproducible example, either with a link to a data 
subset, or using a builtin data set. My guess is that you do not need 
bi-variate spatial correlation at all, but rather a spatial regression.


The "causal" variable would then the the proportion of jobs accessible 
within 60 minutes by transit, though this is extremely blunt, and lots of 
other covariates (demography, etc.) impact average household income per 
capita (per block/tract?). Since there are many missing variables in your 
specification, any spatial correlation would be most closely associated 
with them (demography, housing costs, education, etc.), and the choice of 
units of measurement would dominate the outcome.


This is also why bi-variate spatial correlation is seldom a good idea, I 
believe. It can be done, but most likely shouldn't, unless it can be 
motivated properly.


By the way, the weighted and FDR-corrected SAD local Moran's I p-values of 
the black/white ratio for Oregon (your toy example) did deliver the goods 
- if you zoom in in mapview::mapview, you can see that it detects a rate 
hotspot between the rivers.


Roger



best,

Rafael H M Pereira

On Mon, Jul 24, 2017 at 7:56 PM, Roger Bivand  wrote:


On Mon, 24 Jul 2017, Rafael Pereira wrote:

Hi all,


I would like to ask whether some you conducted bi-variate spatial
correlation in R.

I know the bi-variate Moran's I is not implemented in the spdep library.
I left a question on SO but also wanted to hear if anyone if the mainlist
have come across this.
https://stackoverflow.com/questions/45177590/map-of-bivariat
e-spatial-correlation-in-r-bivariate-lisa

I also know Roger Bivand has implemented the L index proposed by Lee
(2001)
in spdep, but I'm not I'm not sure whether the L local correlation
coefficients can be interpreted the same way as the local Moran's I
coefficients. I couldn't find any reference commenting on this issue. I
would very much appreciate your thoughts this.



In the SO question, and in the follow-up, your presumably throw-away
example makes fundamental mistakes. The code in spdep by Virgilio
Gómez-Rubio is for uni- and bivariate L, and produces point values of local
L. This isn't the main problem, which is rather that you are not taking
account of the underlying population counts, nor shrinking any estimates of
significance to accommodate population sizes. Population sizes vary from 0
to 11858, with the lower quartile at 3164 and upper 5698:
plot(ecdf(oregon.tract$pop2000)). Should you be comparing rates in stead?
These are also compositional variables (sum to pop2000, or 1 if rates) with
the other missing components. You would probably be better served by tools
examining spatial segregation, such as for example the seg package.

The 0 count populations cause problems for an unofficial alternative, the
black/white ratio:

oregon.tract1 <- oregon.tract[oregon.tract$white > 0,]
oregon.tract1$rat <- oregon.tract1$black/oregon.tract1$white
nb <- poly2nb(oregon.tract1)
lw <- nb2listw(nb)

which should still be adjusted by weighting:

lm0 <- lm(rat ~ 1, weights=pop2000, data=oregon.tract1)

I'm not advising this, but running localmoran.sad on this model output
yields SAD p-values < 0.05 after FDR correction only in contiguous tracts
on the Washington state line in Portland between the Columbia and
Williamette rivers. So do look at the variables you are using before
rushing into things.

Hope this clarifies,

Roger



best,

Rafael HM Pereira
http://urbandemographics.blogspot.com

[[alternative HTML version deleted]]

___
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo



--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: roger.biv...@nhh.no
Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html
http://orcid.org/-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0J=en


[[alternative HTML version deleted]]

___
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo


--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; 

Re: [R-sig-Geo] bivariate spatial correlation in R

2017-07-26 Thread Rafael Pereira
Roger,

This example was provided only for the sake or making the code easily
reproducible for others and I'm more interested in how the bi-variate Moran
could be implemented in R, but your comments are very much welcomed and
I've made changes to the question.

My actual case study looks at bi-variate spatial correlation between (a)
average household income per capita and (b) proportion of jobs in the city
that are accessible under 60 minutes by transit. I don't think I could use
rates in this case but I will normalize the variables using
scale(data$variable).

best,

Rafael H M Pereira

On Mon, Jul 24, 2017 at 7:56 PM, Roger Bivand  wrote:

> On Mon, 24 Jul 2017, Rafael Pereira wrote:
>
> Hi all,
>>
>> I would like to ask whether some you conducted bi-variate spatial
>> correlation in R.
>>
>> I know the bi-variate Moran's I is not implemented in the spdep library.
>> I left a question on SO but also wanted to hear if anyone if the mainlist
>> have come across this.
>> https://stackoverflow.com/questions/45177590/map-of-bivariat
>> e-spatial-correlation-in-r-bivariate-lisa
>>
>> I also know Roger Bivand has implemented the L index proposed by Lee
>> (2001)
>> in spdep, but I'm not I'm not sure whether the L local correlation
>> coefficients can be interpreted the same way as the local Moran's I
>> coefficients. I couldn't find any reference commenting on this issue. I
>> would very much appreciate your thoughts this.
>>
>
> In the SO question, and in the follow-up, your presumably throw-away
> example makes fundamental mistakes. The code in spdep by Virgilio
> Gómez-Rubio is for uni- and bivariate L, and produces point values of local
> L. This isn't the main problem, which is rather that you are not taking
> account of the underlying population counts, nor shrinking any estimates of
> significance to accommodate population sizes. Population sizes vary from 0
> to 11858, with the lower quartile at 3164 and upper 5698:
> plot(ecdf(oregon.tract$pop2000)). Should you be comparing rates in stead?
> These are also compositional variables (sum to pop2000, or 1 if rates) with
> the other missing components. You would probably be better served by tools
> examining spatial segregation, such as for example the seg package.
>
> The 0 count populations cause problems for an unofficial alternative, the
> black/white ratio:
>
> oregon.tract1 <- oregon.tract[oregon.tract$white > 0,]
> oregon.tract1$rat <- oregon.tract1$black/oregon.tract1$white
> nb <- poly2nb(oregon.tract1)
> lw <- nb2listw(nb)
>
> which should still be adjusted by weighting:
>
> lm0 <- lm(rat ~ 1, weights=pop2000, data=oregon.tract1)
>
> I'm not advising this, but running localmoran.sad on this model output
> yields SAD p-values < 0.05 after FDR correction only in contiguous tracts
> on the Washington state line in Portland between the Columbia and
> Williamette rivers. So do look at the variables you are using before
> rushing into things.
>
> Hope this clarifies,
>
> Roger
>
>
>> best,
>>
>> Rafael HM Pereira
>> http://urbandemographics.blogspot.com
>>
>> [[alternative HTML version deleted]]
>>
>> ___
>> R-sig-Geo mailing list
>> R-sig-Geo@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>>
> --
> Roger Bivand
> Department of Economics, Norwegian School of Economics,
> Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55; e-mail: roger.biv...@nhh.no
> Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html
> http://orcid.org/-0003-2392-6140
> https://scholar.google.no/citations?user=AWeghB0J=en

[[alternative HTML version deleted]]

___
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo