On 05/05/2018, 12:03, "Roger Bivand" <roger.biv...@nhh.no> wrote:
On Fri, 4 May 2018, Roger Bivand wrote: > On Thu, 3 May 2018, Roger Bivand wrote: > >> On Thu, 3 May 2018, Aguirre Perez, Roman wrote: >> >>> Hi again, >>> >>> first of all, thanks a lot for your comments. It's becoming quite >>> interesting how to perform this task with such amount of data. >>> >>> >>> Here is an update... >>> >>> Cotton, I could read the geodatabase by using the commands I shared. >>> However, my R session crashed after overlaying it with a small set of >>> polygons. I guess it was because the size of the sp object (around 14.5 >>> GB). It's also worth mentioning that it was just a "multipolygon" (is >>> that >>> the correct word?) formed by 2370829 polygons. I haven’t succeeded on >>> installing the sf package, but I will keep trying it. >>> >>> Melanie, I already downloaded and read the geoTiff version. At first >>> sight, it seems that this object doesn’t have enough features as to >>> perform the overlying. I might have a biased idea of how raster objects >>> work - I'm too stick to the representation of shapefiles which sounds >>> quite related with Roger's idea. So I will start to explore it in order >>> to >>> gain a bit more understanding on it. >>> >>> Roger, I certainly need to know which category is associated with each >>> CLC >>> polygon in order to compute the area covered by each of these classes >>> within each NUTS3 region. Therefore, I still need to use a vector-vector >>> overlay, right? >> >> On the contrary. The geoTiffs are coded as described in the documentation. >> Your output is simply the count of raster cells by NUTS3 for each of the >> categories. The geoTiff is in ETRS_1989_LAEA, so is projected; it includes >> a lot of sea and no data because of the French overseas territories. >> >> You could possibly use PostGIS to do the intersections, or use GRASS for >> raster-vector counts (say through rgrass7). In R, you would want to add a >> NUTS3 ID band to the land cover raster, then aggregate by NUTS3 ID. >> >> I would suggest using GRASS as the most obvious route, reading the raster, >> reading the NUTS3 boundaries into a separate location, projecting to LAEA, >> then rasterising the NUTS3 regions (v.to.rast) and running r.cross. You >> get 32K output categories, the 48 corine categories times the count of >> NUTS3 regions minus the nulls (there aren't many glaciers in most >> regions). You'll then need to match back the Corine codes and the NUTS3 >> codes - see in the category label file shown by r.category. >> >> I'll try to provide code tomorrow. > > Combining R and GRASS seems to work, but no guarantees. > > library(sp) > library(rgdal) > Gi <- GDALinfo("g250_clc12_V18_5.tif") > makeSG <- function(x) { > stopifnot(class(x) == "GDALobj") > p4 <- attr(x, "projection") > gt <- GridTopology(c(x[4]+(x[6]/2), x[5]+(x[7]/2)), c(x[6], x[7]), > c(x[2], x[1])) > SpatialGrid(gt, CRS(p4)) > } > SG <- makeSG(Gi) > nuts_ll <- readOGR("NUTS_RG_01M_2013_4326_LEVL_3.shp") > nuts_laea <- spTransform(nuts_ll, CRS(attr(Gi, "projection"))) > library(rgrass7) > td <- tempdir() > iG <- initGRASS("/home/rsb/topics/grass/g740/grass-7.4.0", td, SG) > # your GRASS will be where you installed it > writeVECT(nuts_laea, "nuts", v.in.ogr_flags="o") > execGRASS("r.in.gdal", input="g250_clc12_V18_5.tif", output="corine", > flags="o") > execGRASS("v.to.rast", input="nuts", output="nuts3", use="cat", > label_column="FID") > execGRASS("r.cross", input="nuts3,corine", output="cross_nuts3") > r_stats0 <- execGRASS("r.stats", input="cross_nuts3", flags="a", > intern=TRUE) > r_stats1 <- gsub("\\*", "NA", r_stats0) > con_stats <- textConnection(r_stats1) > stats <- read.table(con_stats, header=FALSE, col.names=c("cross_cat", > "area"), colClasses=c("integer", "numeric")) > close(con_stats) > r_cats0 <- execGRASS("r.category", map="cross_nuts3", intern=TRUE) > r_cats1 <- gsub(";", "", r_cats0) > r_cats2 <- gsub("\t", " ", r_cats1) > r_cats3 <- gsub("no data", "no_data", r_cats2) > r_cats4 <- gsub("category ", "", r_cats3) > r_cats4[1] <- paste0(r_cats4[1], "NA NA") > r_cats_split <- strsplit(r_cats4, " ") > cats <- data.frame(cross_cat=as.integer(sapply(r_cats_split, "[", 1)), > nuts=sapply(r_cats_split, "[", 2), > corine=as.integer(sapply(r_cats_split, "[", 3))) > catstats <- merge(cats, stats, by="cross_cat", all=TRUE) > agg_areas <- tapply(catstats$area, list(catstats$nuts, catstats$corine), > sum) > library(foreign) > corine_labels <- read.dbf("g250_clc12_V18_5.tif.vat.dbf", as.is=TRUE) > o <- match(colnames(agg_areas), as.character(corine_labels$Value)) > colnames(agg_areas) <- corine_labels$LABEL3[o] > agg_areas_df <- as.data.frame(agg_areas) > agg_areas_df1 <- agg_areas_df[-which(!(row.names(agg_areas_df) %in% > as.character(nuts_ll$FID))),] # dropping "NA" "no_data" > > This should be ready to merge with the NUTS3 boundaries, if needed. > > agg_areas_df1$FID <- row.names(agg_areas_df1) > nuts_corine <- merge(nuts_laea, agg_areas_df1, by="FID") > And to write out as GPKG: names(nuts_corine)[1] <- "FID_" writeOGR(nuts_corine, "nuts_corine.gpkg", layer="corine", driver="GPKG") It seems that at least this driver treats "FID" as a reserved field name. > For the vector parts you could use sf and the provisional rgrass7sf on > github, but that wouldn't yet let you construct a skeleton SpatialGrid to > define the GRASS location. Using GRASS for the heavy lifting (the raster is > 51000 by 35000), and avoiding vector for overlay, this doesn't need much > memory (GRASS handles rasters by row). The GRASS temporary location only > takes 130MB of disk space. You could go for the 100m raster resolution, but I > doubt that the outcome would vary much - anyone like to try? > > If the sub-polygons by NUTS and corine categories are actually needed, the > output of r.cross could be passed to r.to.vect: > > execGRASS("r.to.vect", input="cross_nuts3", output="cross_nuts3", > type="area") > > but this is more demanding in memory terms. Hi Roger, thanks a lot again for helping me with this issue. Unfortunately, I haven't managed to call GRASS within R, neither install sf. My guess is that those issues are because I'm using both newest versions OS X High Sierra and R 3.5.0 which are not stable yet. Reading into R is not possible, object too large. Bearing in mind the previous issues plus this last comment, I decided to use a PC with another OS (Ubuntu) aiming at solving them and working with a bit more space (32 GB). Roger In the meantime, I really would appreciate you guidance for gaining background on this subject. Could you please suggest me resources to better understand the features of objects involved in Spatial Statistics? I should mention that the knowledge I have about it is resumed in your book Applied Spatial Data Analysis with R. Best, Roman. > > Interesting case, and it does show that combining GIS and R delivers the > goods - SAGA would probably work equivalently. > > Roger > >> >> Roger >> >>> >>> I really appreciate any feedback in advance as well as details that I >>> should take into account to understand more about how to work with this >>> kind of data. I will also keep you up to date on how it goes if you >>> like. >>> >>> >>> Best, >>> Roman. >>> >>> On 03/05/2018, 12:26, "Roger Bivand" <roger.biv...@nhh.no> wrote: >>> >>> On Thu, 3 May 2018, Aguirre Perez, Roman wrote: >>> >>> > Hello Shaun, >>> > >>> > Thanks a lot for replying and providing me alternative options. >>> > >>> > >>> > Unfortunately, I can't try both options anymore as I ran out of >>> > ArcGIS >>> > due to I was using a university PC which is not available now (I >>> > installed an ArcGIS trial version there), but I'll try it later. I >>> > also >>> > failed on installing the sf package - I'll dig a bit more on it. >>> > Nevertheless, I already downloaded the SpatialLite version so I'll >>> > try >>> > the second option and I'll let you know how it goes. >>> >>> I think Melanie got it right - the apparent extra detail and >>> precision >>> you'd get from vector-vector overlay is illusory, so going with >>> geoTiff >>> should get you there (you can check to see whether 100m resolution >>> differs >>> from 250m). The only reason to choose vector-vector would be that the >>> Corine vector contains categories not represented in the raster >>> version. >>> Using raster also steps around the polygon deficiencies in the GDB >>> (and >>> probably SQLite) representations. >>> >>> Roger >>> >>> > >>> > >>> > Regards, >>> > Roman. >>> > >>> > On 02/05/2018, 18:53, "Shaun Walbridge" <swalbri...@esri.com> >>> > wrote: >>> > >>> > Hello Roman, >>> > >>> > A couple of suggested options: You can try and use the >>> > arcgisbinding package [1] to pull the data directly into R via >>> > ArcGIS, as you mentioned you have ArcGIS available [presumably >>> > on >>> > Windows or in a VM]. It will access the data directly, and let >>> > you >>> > create sp and sf objects out of Geodatabases with little work. A >>> > basic workflow looks like this: >>> > >>> > d <- arc.open("path/file.gdb/layer_name") >>> > df <- arc.select(d) # here, you can filter columns and >>> > attributes, >>> > see [2] >>> > # create an sf object >>> > df.sf <- arc.data2sf(df) >>> > # alternatively, create an sp object >>> > df.sp <- arc.data2sp(df) >>> > >>> > You can also write the results back to a geodatabase, or any >>> > other >>> > format that ArcGIS understands. Another option is trying the >>> > SpatiaLite version of the data at the URL you posted. You should >>> > be able to access this using R directly, provided your rgdal >>> > installation is correctly built to read spatialite databases. If >>> > it is, try the same process you mentioned using rgdal, but point >>> > it at the SpatialLite database instead. You could also use >>> > command-line OGR to convert the data, that has a few options >>> > like >>> > where clause filtering that aren't directly available via >>> > readOGR. >>> > >>> > If neither of these options work, let me know and I can convert >>> > the data for you into a format of your preference. >>> > >>> > Cheers, >>> > Shaun >>> > >>> > 1. >>> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_R-2DArcGIS_r-2Dbridge-2Dinstall&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=yckcjR6k3nADiVsNiAhGwcZB--0A8DQgvLSJ27upmyk&m=dkecE8PWOn5sSrtSXqX2R-VwJuFH_In4BvpC5da4LJ0&s=nRO2IG9dlrdKYGMSp3rCf7nwJqRFQoIWnY5zSFrPOQM&e= >>> > 2. >>> > https://urldefense.proofpoint.com/v2/url?u=https-3A__rdrr.io_github_R-2DArcGIS_r-2Dbridge_man_arc.select.html&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=yckcjR6k3nADiVsNiAhGwcZB--0A8DQgvLSJ27upmyk&m=dkecE8PWOn5sSrtSXqX2R-VwJuFH_In4BvpC5da4LJ0&s=89N-cLe5N3dhZ6MTsM2OAYwmDImBH88ZrVaA5Hos0n4&e= >>> > >>> > >>> > >>> > On 5/2/18, 1:34 PM, "R-sig-Geo on behalf of Aguirre Perez, >>> > Roman" >>> > <r-sig-geo-boun...@r-project.org on behalf of >>> > ra...@exeter.ac.uk> >>> > wrote: >>> > >>> > Hi everyone, >>> > >>> > I’ve been struggling with a ESRI Geodatabase file >>> > (clc12_Version_18_5.gdb) which is a layer of land cover >>> > classes available on >>> > >>> > https://urldefense.proofpoint.com/v2/url?u=https-3A__land.copernicus.eu_pan-2Deuropean_corine-2Dland-2Dcover_clc-2D2012-3Ftab-3Ddownload&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=YFaRLkcUCdDkLrpTbNOUV9J1CwYBCTMwgm5tdQkRSm4&m=qIGI8rXMP_JTDhSFz8NjjQMAyNAnpnUGFRBxqfH4bPU&s=9LaO4HiB5C5ewiuN_IeSQJSgq2tl5_-oMECPChtZa_U&e= >>> > >>> > whose size is 2.61 GB once unzipped. >>> > >>> > My ultimate task is to overlay it with the last NUTS3 >>> > administrative boundaries shapefile (2013) available on >>> > >>> > https://urldefense.proofpoint.com/v2/url?u=http-3A__ec.europa.eu_eurostat_web_gisco_geodata_reference-2Ddata_administrative-2Dunits-2Dstatistical-2Dunits&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=YFaRLkcUCdDkLrpTbNOUV9J1CwYBCTMwgm5tdQkRSm4&m=qIGI8rXMP_JTDhSFz8NjjQMAyNAnpnUGFRBxqfH4bPU&s=VExYSoR8FY_vhZaPNJpoOx0ZCZNMU9hMTtlGJZj2joU&e= >>> > >>> > in order to compute the area covered by each class within >>> > each >>> > NUTS3 region. >>> > >>> > Despite the ease of friendly software for performing this >>> > task, haven’t been capable of doing it - GRASS didn’t load >>> > the >>> > file as the log reports problems with the polygons, QGIS >>> > shows >>> > a warning regarding a specific object and ArcGIS got frozen. >>> > I >>> > guess it’s because the PC I used doesn’t have enough >>> > capacity. >>> > Unfortunately, I don’t have access to a more powerful one. >>> > >>> > Anyway, I decided to try with R -after all, I’ll perform my >>> > analysis with it. So I started exploring this GDB with >>> > rgdal: >>> > >>> > ogrInfo(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5") >>> > >>> > Source: >>> > "/Users/Roman/Desktop/clc12gdb/clc12_Version_18_5.gdb", >>> > layer: "clc12_Version_18_5" >>> > Driver: OpenFileGDB; >>> > number of rows: 2370829 >>> > Feature type: wkbPolygon with 3 dimensions >>> > Extent: (-2693292 -3086662) - (10037210 5440568) >>> > Null geometry IDs: 2156240 >>> > CRS: +proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 >>> > +y_0=3210000 >>> > +ellps=GRS80 +units=m +no_defs >>> > Number of fields: 6 >>> > name type length typeName >>> > 1 code_12 4 3 String >>> > 2 ID 4 18 String >>> > 3 Remark 4 20 String >>> > 4 Area_Ha 2 0 Real >>> > 5 Shape_Length 2 0 Real >>> > 6 Shape_Area 2 0 Real >>> > >>> > Then I tried to load it typing >>> > >>> > clc<-readOGR(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5") >>> > >>> > After 3-5 minutes, it appears the following text: >>> > >>> > OGR data source with driver: OpenFileGDB >>> > Source: >>> > "/Users/Roman/Desktop/clc12shp/clc12_Version_18_5.gdb", >>> > layer: >>> > "clc12_Version_18_5" >>> > with 2370829 features >>> > It has 6 fields >>> > >>> > Unfortunately, after trying 5 times (each one took around 8 >>> > hours) I couldn’t get anything but the following messages: >>> > >>> > Warning messages: >>> > 1: In readOGR(dsn = "clc12_Version_18_5.gdb", layer = >>> > "clc12_Version_18_5") : >>> > Dropping null geometries: 2156240 >>> > 2: In readOGR(dsn = "clc12_Version_18_5.gdb", layer = >>> > "clc12_Version_18_5") : >>> > Z-dimension discarded >>> > >>> > Could anyone advise me how to tackle it? I also would >>> > appreciate suggestions on how to work with geodatabases - >>> > it’s >>> > my first time I work with these kind of files so I don’t >>> > even >>> > know their structure. >>> > >>> > By the way, these are my computer and software >>> > specifications: >>> > >>> > MacBook Pro 2012 >>> > Processor 2.6 GHz Intel Core i7 >>> > Memory 16 GB DDR3 >>> > R 3.5.0 >>> > RStudio 1.1.447 >>> > >>> > >>> > Best, >>> > Roman. >>> > >>> > >>> > >>> > >>> > _______________________________________________ >>> > R-sig-Geo mailing list >>> > R-sig-Geo@r-project.org >>> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo >>> >>> -- >>> Roger Bivand >>> Department of Economics, Norwegian School of Economics, >>> Helleveien 30, N-5045 Bergen, Norway. >>> voice: +47 55 95 93 55; e-mail: roger.biv...@nhh.no >>> http://orcid.org/0000-0003-2392-6140 >>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en >>> >>> >> >> > > -- Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; e-mail: roger.biv...@nhh.no http://orcid.org/0000-0003-2392-6140 https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en _______________________________________________ R-sig-Geo mailing list R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo