On Thu, 3 May 2018, Roger Bivand wrote:

On Thu, 3 May 2018, Aguirre Perez, Roman wrote:

 Hi again,

 first of all, thanks a lot for your comments. It's becoming quite
 interesting how to perform this task with such amount of data.


 Here is an update...

 Cotton, I could read the geodatabase by using the commands I shared.
 However, my R session crashed after overlaying it with a small set of
 polygons. I guess it was because the size of the sp object (around 14.5
 GB). It's also worth mentioning that it was just a "multipolygon" (is that
 the correct word?) formed by 2370829 polygons. I haven’t succeeded on
 installing the sf package, but I will keep trying it.

 Melanie, I already downloaded and read the geoTiff version. At first
 sight, it seems that this object doesn’t have enough features as to
 perform the overlying. I might have a biased idea of how raster objects
 work - I'm too stick to the representation of shapefiles which sounds
 quite related with Roger's idea. So I will start to explore it in order to
 gain a bit more understanding on it.

 Roger, I certainly need to know which category is associated with each CLC
 polygon in order to compute the area covered by each of these classes
 within each NUTS3 region. Therefore, I still need to use a vector-vector
 overlay, right?

On the contrary. The geoTiffs are coded as described in the documentation. Your output is simply the count of raster cells by NUTS3 for each of the categories. The geoTiff is in ETRS_1989_LAEA, so is projected; it includes a lot of sea and no data because of the French overseas territories.

You could possibly use PostGIS to do the intersections, or use GRASS for raster-vector counts (say through rgrass7). In R, you would want to add a NUTS3 ID band to the land cover raster, then aggregate by NUTS3 ID.

I would suggest using GRASS as the most obvious route, reading the raster, reading the NUTS3 boundaries into a separate location, projecting to LAEA, then rasterising the NUTS3 regions (v.to.rast) and running r.cross. You get 32K output categories, the 48 corine categories times the count of NUTS3 regions minus the nulls (there aren't many glaciers in most regions). You'll then need to match back the Corine codes and the NUTS3 codes - see in the category label file shown by r.category.

I'll try to provide code tomorrow.

Combining R and GRASS seems to work, but no guarantees.

library(sp)
library(rgdal)
Gi <- GDALinfo("g250_clc12_V18_5.tif")
makeSG <- function(x) {
  stopifnot(class(x) == "GDALobj")
  p4 <- attr(x, "projection")
  gt <- GridTopology(c(x[4]+(x[6]/2), x[5]+(x[7]/2)), c(x[6], x[7]),
    c(x[2], x[1]))
  SpatialGrid(gt, CRS(p4))
}
SG <- makeSG(Gi)
nuts_ll <- readOGR("NUTS_RG_01M_2013_4326_LEVL_3.shp")
nuts_laea <- spTransform(nuts_ll, CRS(attr(Gi, "projection")))
library(rgrass7)
td <- tempdir()
iG <- initGRASS("/home/rsb/topics/grass/g740/grass-7.4.0", td, SG)
# your GRASS will be where you installed it
writeVECT(nuts_laea, "nuts", v.in.ogr_flags="o")
execGRASS("r.in.gdal", input="g250_clc12_V18_5.tif", output="corine",
  flags="o")
execGRASS("v.to.rast", input="nuts", output="nuts3", use="cat",
  label_column="FID")
execGRASS("r.cross", input="nuts3,corine", output="cross_nuts3")
r_stats0 <- execGRASS("r.stats", input="cross_nuts3", flags="a",
  intern=TRUE)
r_stats1 <- gsub("\\*", "NA", r_stats0)
con_stats <- textConnection(r_stats1)
stats <- read.table(con_stats, header=FALSE, col.names=c("cross_cat",
  "area"), colClasses=c("integer", "numeric"))
close(con_stats)
r_cats0 <- execGRASS("r.category", map="cross_nuts3", intern=TRUE)
r_cats1 <- gsub(";", "", r_cats0)
r_cats2 <- gsub("\t", " ", r_cats1)
r_cats3 <- gsub("no data", "no_data", r_cats2)
r_cats4 <- gsub("category ", "", r_cats3)
r_cats4[1] <- paste0(r_cats4[1], "NA NA")
r_cats_split <- strsplit(r_cats4, " ")
cats <- data.frame(cross_cat=as.integer(sapply(r_cats_split, "[", 1)),
  nuts=sapply(r_cats_split, "[", 2),
  corine=as.integer(sapply(r_cats_split, "[", 3)))
catstats <- merge(cats, stats, by="cross_cat", all=TRUE)
agg_areas <- tapply(catstats$area, list(catstats$nuts, catstats$corine),
  sum)
library(foreign)
corine_labels <- read.dbf("g250_clc12_V18_5.tif.vat.dbf", as.is=TRUE)
o <- match(colnames(agg_areas), as.character(corine_labels$Value))
colnames(agg_areas) <- corine_labels$LABEL3[o]
agg_areas_df <- as.data.frame(agg_areas)
agg_areas_df1 <- agg_areas_df[-which(!(row.names(agg_areas_df) %in%
  as.character(nuts_ll$FID))),] # dropping "NA"      "no_data"

This should be ready to merge with the NUTS3 boundaries, if needed.

agg_areas_df1$FID <- row.names(agg_areas_df1)
nuts_corine <- merge(nuts_laea, agg_areas_df1, by="FID")

For the vector parts you could use sf and the provisional rgrass7sf on github, but that wouldn't yet let you construct a skeleton SpatialGrid to define the GRASS location. Using GRASS for the heavy lifting (the raster is 51000 by 35000), and avoiding vector for overlay, this doesn't need much memory (GRASS handles rasters by row). The GRASS temporary location only takes 130MB of disk space. You could go for the 100m raster resolution, but I doubt that the outcome would vary much - anyone like to try?

If the sub-polygons by NUTS and corine categories are actually needed, the output of r.cross could be passed to r.to.vect:

execGRASS("r.to.vect", input="cross_nuts3", output="cross_nuts3",
  type="area")

but this is more demanding in memory terms.

Interesting case, and it does show that combining GIS and R delivers the goods - SAGA would probably work equivalently.

Roger


Roger


 I really appreciate any feedback in advance as well as details that I
 should take into account to understand more about how to work with this
 kind of data. I will also keep you up to date on how it goes if you like.


 Best,
 Roman.

 On 03/05/2018, 12:26, "Roger Bivand" <roger.biv...@nhh.no> wrote:

    On Thu, 3 May 2018, Aguirre Perez, Roman wrote:

   >  Hello Shaun,
   >
   >  Thanks a lot for replying and providing me alternative options.
   >
   >
   >  Unfortunately, I can't try both options anymore as I ran out of
   >  ArcGIS
   >  due to I was using a university PC which is not available now (I
   >  installed an ArcGIS trial version there), but I'll try it later. I
   >  also
   >  failed on installing the sf package - I'll dig a bit more on it.
   >  Nevertheless, I already downloaded the SpatialLite version so I'll
   >  try
   >  the second option and I'll let you know how it goes.

    I think Melanie got it right - the apparent extra detail and precision
    you'd get from vector-vector overlay is illusory, so going with geoTiff
    should get you there (you can check to see whether 100m resolution
    differs
    from 250m). The only reason to choose vector-vector would be that the
    Corine vector contains categories not represented in the raster
    version.
    Using raster also steps around the polygon deficiencies in the GDB (and
    probably SQLite) representations.

    Roger

   >
   >
   >  Regards,
   >  Roman.
   >
   >  On 02/05/2018, 18:53, "Shaun Walbridge" <swalbri...@esri.com> wrote:
   >
   >     Hello Roman,
   >
   >     A couple of suggested options: You can try and use the
   >     arcgisbinding package [1] to pull the data directly into R via
   >     ArcGIS, as you mentioned you have ArcGIS available [presumably on
   >     Windows or in a VM]. It will access the data directly, and let you
   >     create sp and sf objects out of Geodatabases with little work. A
   >     basic workflow looks like this:
   >
   >     d <- arc.open("path/file.gdb/layer_name")
   >     df <- arc.select(d) # here, you can filter columns and attributes,
   >     see [2]
   >     # create an sf object
   >     df.sf <- arc.data2sf(df)
   >     # alternatively, create an sp object
   >     df.sp <- arc.data2sp(df)
   >
   >     You can also write the results back to a geodatabase, or any other
   >     format that ArcGIS understands. Another option is trying the
   >     SpatiaLite version of the data at the URL you posted. You should
   >     be able to access this using R directly, provided your rgdal
   >     installation is correctly built to read spatialite databases. If
   >     it is, try the same process you mentioned using rgdal, but point
   >     it at the SpatialLite database instead. You could also use
   >     command-line OGR to convert the data, that has a few options like
   >     where clause filtering that aren't directly available via readOGR.
   >
   >     If neither of these options work, let me know and I can convert
   >     the data for you into a format of your preference.
   >
   >     Cheers,
   >     Shaun
   >
   >     1.
   >     
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_R-2DArcGIS_r-2Dbridge-2Dinstall&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=yckcjR6k3nADiVsNiAhGwcZB--0A8DQgvLSJ27upmyk&m=dkecE8PWOn5sSrtSXqX2R-VwJuFH_In4BvpC5da4LJ0&s=nRO2IG9dlrdKYGMSp3rCf7nwJqRFQoIWnY5zSFrPOQM&e=
   >     2.
   >     
https://urldefense.proofpoint.com/v2/url?u=https-3A__rdrr.io_github_R-2DArcGIS_r-2Dbridge_man_arc.select.html&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=yckcjR6k3nADiVsNiAhGwcZB--0A8DQgvLSJ27upmyk&m=dkecE8PWOn5sSrtSXqX2R-VwJuFH_In4BvpC5da4LJ0&s=89N-cLe5N3dhZ6MTsM2OAYwmDImBH88ZrVaA5Hos0n4&e=
   >
   >
   >
   >     On 5/2/18, 1:34 PM, "R-sig-Geo on behalf of Aguirre Perez, Roman"
   >     <r-sig-geo-boun...@r-project.org on behalf of ra...@exeter.ac.uk>
   >     wrote:
   >
   >         Hi everyone,
   >
   >         I’ve been struggling with a ESRI Geodatabase file
   >         (clc12_Version_18_5.gdb) which is a layer of land cover
   >         classes available on
   >
   >         
https://urldefense.proofpoint.com/v2/url?u=https-3A__land.copernicus.eu_pan-2Deuropean_corine-2Dland-2Dcover_clc-2D2012-3Ftab-3Ddownload&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=YFaRLkcUCdDkLrpTbNOUV9J1CwYBCTMwgm5tdQkRSm4&m=qIGI8rXMP_JTDhSFz8NjjQMAyNAnpnUGFRBxqfH4bPU&s=9LaO4HiB5C5ewiuN_IeSQJSgq2tl5_-oMECPChtZa_U&e=
   >
   >         whose size is 2.61 GB once unzipped.
   >
   >         My ultimate task is to overlay it with the last NUTS3
   >         administrative boundaries shapefile (2013) available on
   >
   >         
https://urldefense.proofpoint.com/v2/url?u=http-3A__ec.europa.eu_eurostat_web_gisco_geodata_reference-2Ddata_administrative-2Dunits-2Dstatistical-2Dunits&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=YFaRLkcUCdDkLrpTbNOUV9J1CwYBCTMwgm5tdQkRSm4&m=qIGI8rXMP_JTDhSFz8NjjQMAyNAnpnUGFRBxqfH4bPU&s=VExYSoR8FY_vhZaPNJpoOx0ZCZNMU9hMTtlGJZj2joU&e=
   >
   >         in order to compute the area covered by each class within each
   >         NUTS3 region.
   >
   >         Despite the ease of friendly software for performing this
   >         task, haven’t been capable of doing it - GRASS didn’t load the
   >         file as the log reports problems with the polygons, QGIS shows
   >         a warning regarding a specific object and ArcGIS got frozen. I
   >         guess it’s because the PC I used doesn’t have enough capacity.
   >         Unfortunately, I don’t have access to a more powerful one.
   >
   >         Anyway, I decided to try with R -after all, I’ll perform my
   >         analysis with it. So I started exploring this GDB with rgdal:
   >
   >         ogrInfo(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")
   >
   >         Source:
   >         "/Users/Roman/Desktop/clc12gdb/clc12_Version_18_5.gdb",
   >         layer: "clc12_Version_18_5"
   >         Driver: OpenFileGDB;
   >         number of rows: 2370829
   >         Feature type: wkbPolygon with 3 dimensions
   >         Extent: (-2693292 -3086662) - (10037210 5440568)
   >         Null geometry IDs: 2156240
   >         CRS: +proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000
   >         +ellps=GRS80 +units=m +no_defs
   >         Number of fields: 6
   >         name                      type  length  typeName
   >         1 code_12              4       3           String
   >         2 ID                         4       18         String
   >         3 Remark               4       20         String
   >         4 Area_Ha             2       0           Real
   >         5 Shape_Length   2       0           Real
   >         6 Shape_Area       2       0           Real
   >
   >         Then I tried to load it typing
   >
   >         
clc<-readOGR(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")
   >
   >         After 3-5 minutes, it appears the following text:
   >
   >         OGR data source with driver: OpenFileGDB
   >         Source:
   >         "/Users/Roman/Desktop/clc12shp/clc12_Version_18_5.gdb", layer:
   >         "clc12_Version_18_5"
   >         with 2370829 features
   >         It has 6 fields
   >
   >         Unfortunately, after trying 5 times (each one took around 8
   >         hours) I couldn’t get anything but the following messages:
   >
   >         Warning messages:
   >         1: In readOGR(dsn = "clc12_Version_18_5.gdb", layer =
   >         "clc12_Version_18_5") :
   >         Dropping null geometries: 2156240
   >         2: In readOGR(dsn = "clc12_Version_18_5.gdb", layer =
   >         "clc12_Version_18_5") :
   >         Z-dimension discarded
   >
   >         Could anyone advise me how to tackle it? I also would
   >         appreciate suggestions on how to work with geodatabases - it’s
   >         my first time I work with these kind of files so I don’t even
   >         know their structure.
   >
   >         By the way, these are my computer and software specifications:
   >
   >         MacBook Pro 2012
   >         Processor 2.6 GHz Intel Core i7
   >         Memory 16 GB DDR3
   >         R 3.5.0
   >         RStudio 1.1.447
   >
   >
   >         Best,
   >         Roman.
   >
   >
   >
   >
   >  _______________________________________________
   >  R-sig-Geo mailing list
   >  R-sig-Geo@r-project.org
   >  https://stat.ethz.ch/mailman/listinfo/r-sig-geo

    --
    Roger Bivand
    Department of Economics, Norwegian School of Economics,
    Helleveien 30, N-5045 Bergen, Norway.
    voice: +47 55 95 93 55; e-mail: roger.biv...@nhh.no
    http://orcid.org/0000-0003-2392-6140
    https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en





--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: roger.biv...@nhh.no
http://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
_______________________________________________
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Reply via email to