[R-sig-Geo] How to objectively subset cities by population
Dear all, I have temperature records of nearly 1200 locations in southern Brazil. I am writing a shiny app that will show an interactive map with the locations plotted as circles, where the user can click a location to see its temperature time series. However, if I show all the locations in the map, it will look really bad, too cramped. Therefore, in an attempt to make the map look a bit cleaner, I am trying to think of an objective way to subset the locations. My initial approach would be to show only the "largest" locations, i.e. the ones with a population above a certain threshold. The problem is: the distribution of the population is so positively skewed that I am having a hard time determining the optimal cutoff point. Does anybody here know any tool or method, possibly spatial, that can assist me with this analysis? These are the locations I am working with: #--- # Download and summarize locs <- read.csv("https://www.dropbox.com/s/ykdd8x1mlc76klt/locations.csv?raw=1;) hist(locs$Population) summary(locs$Population) # Convert to spatial points and plot require(sp) coordinates(locs) <- cbind(locs$Lon , locs$Lat) plot(locs) bubble(locs,"Population") #--- Thanks in advance, -- Thiago V. dos Santos PhD student Land and Atmospheric Science University of Minnesota [[alternative HTML version deleted]] ___ R-sig-Geo mailing list R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
[R-sig-Geo] Finding overlapping polygons
Dear everyone, I am studying the potential spatial distribution between vertebrate invasive species and native ones (threatened) in the souther cone of South America. The invasive species' distribution was grossly approximated using an ecoregional approach, which is not relevant here. The analysis is based on shapefiles for all terrestrial mammals and birds of the world obtained from BirdLife International (http://www.biodiversityinfo.org/spcdownload/r5h8a1/) and IUCN (http://www.iucnredlist.org/technical-documents/spatial-data) public domains. Our goal: using a two-step approach, identify native species (polygons) whose distributionoverlap -totally or partially- with that of the invasive. Our problem: Some species (with small distribution) which are known to occur within the invasive distribution range are being left out of the final selection (checkedin QGIS). I suspect the function "isin" in the fastshp package (2nd loop) fails to select some of the overlapping polygons in Step 2, or some other problem. If a reproducible example is absolutely required, I can upload to dropbox. For the sake of simplicity, I will present present a case example including one invasive species (e.g. wild boar) and all terrestrial mammals. # Analysis of Wild Boar distributional overlap with IUCN mammal species install.packages("rgdal") install.packages("sp") install.packages("ggplot2") install.packages("rgeos") install.packages("raster") install.packages("fastshp", repos = "http://rforge.net;, type = "source")install.packages("cleangeo") mypath <- ("/Users/ ... /Terrestrial mammals") files <- list.files(path=mypath, pattern = "\\.shp$", full.names=T) length(files) # 5,465 mammal species # Load study area study_area <- read.shp("C:/Users/.../Study_area.shp", format="table", close=FALSE) salon <- range(study_area$x, na.rm=T) salat <- range(study_area$y, na.rm=T) bounding <- c(salon, salat) box <- data.frame(salon, salat) colnames(box) <- c("Longitud","Latitude") box # Step 1. Run all native IUCN species' distribution through a loop and select those which overlap with the invasive's bounding box distribution keep <- rep(NA, length(files)) length(keep) # 5,465 for(i in 1:length(files)){ r <- read.shp(files[i], format="table") rlon <- range(r$x) rlat <- range(r$y) ## Is out of the bounding box? test.lat <- (min(rlat) > max(salat)) | (max(rlat) < min(salat)) test.lon <- (min(rlon) > max(salon)) | (max(rlon) < min(salon)) keep[i] <- !(test.lon | test.lat) } keep <- which(keep==1) # Index of shapefiles that passed the first filter length(keep) # 773 species passed 1st filter # Step 2. Load invasive species distribution area study_area <- readOGR("/Users/.../Study area","Study_area") study_area <- spTransform(study_area, CRS("+init=epsg:32721")) # To UTM # Get a report of geometry validity & issues for a sp spatial object report <- clgeo_CollectionReport(study_area) summary <- clgeo_SummaryReport(report) summary # Remove holes and simplify geometry of study area for computational efficiency outer <- Filter(function(x){x@ringDir==1}, study_area@polygons[[1]]@Polygons) # PREGUNTAR study_area <- SpatialPolygons(list(Polygons(outer, ID=1))) study_area <- disaggregate(study_area) areas <- gArea(study_area, byid=TRUE) #Remove polygons smaller than 3000 km2 study_area <- study_area[which(areas > 3E9),] study_area <- gSimplify(study_area, 2, topologyPreserve=TRUE) # Simplify geometry proj4string(study_area) <- CRS("+init=epsg:32721") study_area <- spTransform(study_area, CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0")) xy <- fortify(study_area) # SpatialPolygons to data frame keep2 <- rep(NA,length(keep)) # 773 sp. that passed first filter for(i in 1:length(keep)){ r <- read.shp(files[keep[i]], format="polygon") isin <- any(inside(r, x=xy$long, y=xy$lat), na.rm=T) keep2[i] <- isin } keep2 <- which(keep2) # Index of shapefiles that passed the second sort length(keep2) # 532 species that passed the second filter species <- list.files(path=mypath, pattern = "\\.shp$",full.names=F) species <- sub(".shp", "", species) species <- species[keep[keep2]] --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus [[alternative HTML version deleted]] . ___ R-sig-Geo mailing list R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
[R-sig-Geo] Finding overlapping polygons
Dear everyone, I am studying the potential spatal distribution between vertebrate invasive species and native, threatened ones in the souther cone of South America. The invasive species' distribution was grossly approximated using an ecoregion approach, which is not relevant here. Then, I downloaded shapefiles for all terrestrial mammals and birds of the world from BirdLife International (http://www.biodiversityinfo.org/spcdownload/r5h8a1/) and IUCN (http://www.iucnredlist.org/technical-documents/spatial-data) public domains. Our goal: using a two-step approach, identify native species (polygons) whose distributionoverlap -totally or partially- with that of the invasive. Our problem: Some species (with small distribution) which are known to occur within the invasive distribution range are being left out of the final selection (checkedin QGIS). I suspect the function "isin" in the fastshp package (2nd loop) fails to select some of the overlapping polygons in Step 2, or some other problem. If a reproducible example is absolutely required, I can upload to dropbox. For the sake of simplicity, I will present present a case example including one invasive species (e.g. wild boar) and all terrestrial mammals. # Analysis of Wild Boar distributional overlap with IUCN mammal species install.packages("rgdal") install.packages("sp") install.packages("ggplot2") install.packages("rgeos") install.packages("raster") install.packages("fastshp", repos = "http://rforge.net;, type = "source")install.packages("cleangeo") mypath <- ("/Users/ ... /Terrestrial mammals") files <- list.files(path=mypath, pattern = "\\.shp$", full.names=T) length(files) # 5,465 mammal species # Load study area study_area <- read.shp("C:/Users/.../Study_area.shp", format="table", close=FALSE) salon <- range(study_area$x, na.rm=T) salat <- range(study_area$y, na.rm=T) bounding <- c(salon, salat) box <- data.frame(salon, salat) colnames(box) <- c("Longitud","Latitude") box # Step 1. Run all native IUCN species' distribution through a loop and select those which overlap with the invasive's bounding box distribution keep <- rep(NA, length(files)) length(keep) # 5,465 for(i in 1:length(files)){ r <- read.shp(files[i], format="table") rlon <- range(r$x) rlat <- range(r$y) ## Is out of the bounding box? test.lat <- (min(rlat) > max(salat)) | (max(rlat) < min(salat)) test.lon <- (min(rlon) > max(salon)) | (max(rlon) < min(salon)) keep[i] <- !(test.lon | test.lat) } keep <- which(keep==1) # Index of shapefiles that passed the first filter length(keep) # 773 species passed 1st filter # Step 2. Load invasive species distribution area study_area <- readOGR("/Users/.../Study area","Study_area") study_area <- spTransform(study_area, CRS("+init=epsg:32721")) # To UTM # Get a report of geometry validity & issues for a sp spatial object report <- clgeo_CollectionReport(study_area) summary <- clgeo_SummaryReport(report) summary # Remove holes and simplify geometry of study area for computational efficiency outer <- Filter(function(x){x@ringDir==1}, study_area@polygons[[1]]@Polygons) # PREGUNTAR study_area <- SpatialPolygons(list(Polygons(outer, ID=1))) study_area <- disaggregate(study_area) areas <- gArea(study_area, byid=TRUE) #Remove polygons smaller than 3000 km2 study_area <- study_area[which(areas > 3E9),] study_area <- gSimplify(study_area, 2, topologyPreserve=TRUE) # Simplify geometry proj4string(study_area) <- CRS("+init=epsg:32721") study_area <- spTransform(study_area, CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0")) xy <- fortify(study_area) # SpatialPolygons to data frame keep2 <- rep(NA,length(keep)) # 773 sp. that passed first filter for(i in 1:length(keep)){ r <- read.shp(files[keep[i]], format="polygon") isin <- any(inside(r, x=xy$long, y=xy$lat), na.rm=T) keep2[i] <- isin } keep2 <- which(keep2) # Index of shapefiles that passed the second sort length(keep2) # 532 species that passed the second filter species <- list.files(path=mypath, pattern = "\\.shp$",full.names=F) species <- sub(".shp", "", species) species <- species[keep[keep2]] --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus [[alternative HTML version deleted]] ___ R-sig-Geo mailing list R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Re: [R-sig-Geo] bivariate spatial correlation in R
On Wed, 26 Jul 2017, Rafael Pereira wrote: Roger, This example was provided only for the sake or making the code easily reproducible for others and I'm more interested in how the bi-variate Moran could be implemented in R, but your comments are very much welcomed and I've made changes to the question. My actual case study looks at bi-variate spatial correlation between (a) average household income per capita and (b) proportion of jobs in the city that are accessible under 60 minutes by transit. I don't think I could use rates in this case but I will normalize the variables using scale(data$variable). Please provide a reproducible example, either with a link to a data subset, or using a builtin data set. My guess is that you do not need bi-variate spatial correlation at all, but rather a spatial regression. The "causal" variable would then the the proportion of jobs accessible within 60 minutes by transit, though this is extremely blunt, and lots of other covariates (demography, etc.) impact average household income per capita (per block/tract?). Since there are many missing variables in your specification, any spatial correlation would be most closely associated with them (demography, housing costs, education, etc.), and the choice of units of measurement would dominate the outcome. This is also why bi-variate spatial correlation is seldom a good idea, I believe. It can be done, but most likely shouldn't, unless it can be motivated properly. By the way, the weighted and FDR-corrected SAD local Moran's I p-values of the black/white ratio for Oregon (your toy example) did deliver the goods - if you zoom in in mapview::mapview, you can see that it detects a rate hotspot between the rivers. Roger best, Rafael H M Pereira On Mon, Jul 24, 2017 at 7:56 PM, Roger Bivandwrote: On Mon, 24 Jul 2017, Rafael Pereira wrote: Hi all, I would like to ask whether some you conducted bi-variate spatial correlation in R. I know the bi-variate Moran's I is not implemented in the spdep library. I left a question on SO but also wanted to hear if anyone if the mainlist have come across this. https://stackoverflow.com/questions/45177590/map-of-bivariat e-spatial-correlation-in-r-bivariate-lisa I also know Roger Bivand has implemented the L index proposed by Lee (2001) in spdep, but I'm not I'm not sure whether the L local correlation coefficients can be interpreted the same way as the local Moran's I coefficients. I couldn't find any reference commenting on this issue. I would very much appreciate your thoughts this. In the SO question, and in the follow-up, your presumably throw-away example makes fundamental mistakes. The code in spdep by Virgilio Gómez-Rubio is for uni- and bivariate L, and produces point values of local L. This isn't the main problem, which is rather that you are not taking account of the underlying population counts, nor shrinking any estimates of significance to accommodate population sizes. Population sizes vary from 0 to 11858, with the lower quartile at 3164 and upper 5698: plot(ecdf(oregon.tract$pop2000)). Should you be comparing rates in stead? These are also compositional variables (sum to pop2000, or 1 if rates) with the other missing components. You would probably be better served by tools examining spatial segregation, such as for example the seg package. The 0 count populations cause problems for an unofficial alternative, the black/white ratio: oregon.tract1 <- oregon.tract[oregon.tract$white > 0,] oregon.tract1$rat <- oregon.tract1$black/oregon.tract1$white nb <- poly2nb(oregon.tract1) lw <- nb2listw(nb) which should still be adjusted by weighting: lm0 <- lm(rat ~ 1, weights=pop2000, data=oregon.tract1) I'm not advising this, but running localmoran.sad on this model output yields SAD p-values < 0.05 after FDR correction only in contiguous tracts on the Washington state line in Portland between the Columbia and Williamette rivers. So do look at the variables you are using before rushing into things. Hope this clarifies, Roger best, Rafael HM Pereira http://urbandemographics.blogspot.com [[alternative HTML version deleted]] ___ R-sig-Geo mailing list R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo -- Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; e-mail: roger.biv...@nhh.no Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html http://orcid.org/-0003-2392-6140 https://scholar.google.no/citations?user=AWeghB0J=en [[alternative HTML version deleted]] ___ R-sig-Geo mailing list R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo -- Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55;
Re: [R-sig-Geo] bivariate spatial correlation in R
Roger, This example was provided only for the sake or making the code easily reproducible for others and I'm more interested in how the bi-variate Moran could be implemented in R, but your comments are very much welcomed and I've made changes to the question. My actual case study looks at bi-variate spatial correlation between (a) average household income per capita and (b) proportion of jobs in the city that are accessible under 60 minutes by transit. I don't think I could use rates in this case but I will normalize the variables using scale(data$variable). best, Rafael H M Pereira On Mon, Jul 24, 2017 at 7:56 PM, Roger Bivandwrote: > On Mon, 24 Jul 2017, Rafael Pereira wrote: > > Hi all, >> >> I would like to ask whether some you conducted bi-variate spatial >> correlation in R. >> >> I know the bi-variate Moran's I is not implemented in the spdep library. >> I left a question on SO but also wanted to hear if anyone if the mainlist >> have come across this. >> https://stackoverflow.com/questions/45177590/map-of-bivariat >> e-spatial-correlation-in-r-bivariate-lisa >> >> I also know Roger Bivand has implemented the L index proposed by Lee >> (2001) >> in spdep, but I'm not I'm not sure whether the L local correlation >> coefficients can be interpreted the same way as the local Moran's I >> coefficients. I couldn't find any reference commenting on this issue. I >> would very much appreciate your thoughts this. >> > > In the SO question, and in the follow-up, your presumably throw-away > example makes fundamental mistakes. The code in spdep by Virgilio > Gómez-Rubio is for uni- and bivariate L, and produces point values of local > L. This isn't the main problem, which is rather that you are not taking > account of the underlying population counts, nor shrinking any estimates of > significance to accommodate population sizes. Population sizes vary from 0 > to 11858, with the lower quartile at 3164 and upper 5698: > plot(ecdf(oregon.tract$pop2000)). Should you be comparing rates in stead? > These are also compositional variables (sum to pop2000, or 1 if rates) with > the other missing components. You would probably be better served by tools > examining spatial segregation, such as for example the seg package. > > The 0 count populations cause problems for an unofficial alternative, the > black/white ratio: > > oregon.tract1 <- oregon.tract[oregon.tract$white > 0,] > oregon.tract1$rat <- oregon.tract1$black/oregon.tract1$white > nb <- poly2nb(oregon.tract1) > lw <- nb2listw(nb) > > which should still be adjusted by weighting: > > lm0 <- lm(rat ~ 1, weights=pop2000, data=oregon.tract1) > > I'm not advising this, but running localmoran.sad on this model output > yields SAD p-values < 0.05 after FDR correction only in contiguous tracts > on the Washington state line in Portland between the Columbia and > Williamette rivers. So do look at the variables you are using before > rushing into things. > > Hope this clarifies, > > Roger > > >> best, >> >> Rafael HM Pereira >> http://urbandemographics.blogspot.com >> >> [[alternative HTML version deleted]] >> >> ___ >> R-sig-Geo mailing list >> R-sig-Geo@r-project.org >> https://stat.ethz.ch/mailman/listinfo/r-sig-geo >> >> > -- > Roger Bivand > Department of Economics, Norwegian School of Economics, > Helleveien 30, N-5045 Bergen, Norway. > voice: +47 55 95 93 55; e-mail: roger.biv...@nhh.no > Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html > http://orcid.org/-0003-2392-6140 > https://scholar.google.no/citations?user=AWeghB0J=en [[alternative HTML version deleted]] ___ R-sig-Geo mailing list R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo