Re: [R-sig-Geo] large shapefiles; zip code data

Roger Bivand Thu, 12 Mar 2009 06:56:03 -0700

On Thu, 12 Mar 2009, Ben Fissel wrote:

Hello,


I am attempting to fit a model CAR count data model of the Besag-York-Mollie
form to US zip code data (entire US minus Hawaii and Alaska).  However, I'm
running out of memory reading in the zip code data.  The zip code data I
obtained form the census at
http://www2.census.gov/geo/tiger/TIGER2008/tl_2008_us_zcta500.zip and are
shapefiles.  I've allocated 4GB of memory to R which is the max my OS will
give it (Vista).  Despite this when I attempt to load the shapefiles in I
run out of memory using readOGR or readShapePoly .  I had a similar problem
in Stata and worked around it by reading in the shapefiles for the lower 48
states http://www.census.gov/geo/www/cob/z52000.html separately and
concatenating them together, relabeling the ID in the process.  I'm trying
to do the same thing in R but relabeling the ID is not as straight forward
for me given my novice R programming ability.  Luckily I found a little help
at
http://help.nceas.ucsb.edu/R:_Spatial#Understanding_spatial_data_formats_in_Rwhich
I adapted to my code.


Ben,

In addition to the answers you've already had, you could look at thespRbind() and spChFIDs() methods in maptools. spChFIDs() lets youmanipulate the IDs (to make them unique, for example), and spRbind()sticks them together. I've used the combination for assembling US censustracts (68'), so a larger task than you face. I have then run poly2nb() inspdep on the output, which completed, although needing a lot of time.

There is an example of this in detail on the ASDAR book website,http://www.asdar-book.org, see the code examples for Chapter 5, but therejust assembling data for counties in three US states.


Hope this helps,

Roger


spatdata <- readOGR(".", "zt01_d00")
#spatdata <- readShapePoly("zt01_d00")
names(spatdata)[3] <- "ZT00_D00"
names(spatdata)[4] <- "ZT00_D00_I"

for (j in 2:2){    # Just loop over one file until I get it to work
  filename <- paste("zt", statelist[j], "_d00",sep ="")  #statlist  is a
vector of the form statelist <- c("01","04",...,"56") with number that
correspond the 48 state shapefiles

  spatdf <- readOGR(".", filename)
# spatdf <- readShapePoly(filename)
  names(spatdf)[3] <- "ZT00_D00"
  names(spatdf)[4] <- "ZT00_D00_I"
  mergedata <- rbind(spatd...@data,spa...@data)
  mergepolys <- c(spatd...@polygons,spa...@polygons)
  mergepolysp <-
SpatialPolygons(mergepolys,proj4string=CRS(proj4string(spatdf)))
  rm("spatdata","spatdf","filename")

  for (i in 1: length(mergepolys)){
    sNew = as.character(i)
    mergepolys...@id = sNew
  }
  ID <- c(as.character(1:length(mergepolys)))
  mergedataID <- cbind(ID,mergedata)
  spatdata <- SpatialPolygonsDataFrame(mergepolysp,data =
mergedataID,match.ID = FALSE)
  rm("mergepolys","mergedata","mergepolysp","mergedataID","ID")

  gc()
}

However in the for loop over "i" I get an error when trying to relabel the
ID: "Error in validObject(.Object) :  invalid class "SpatialPolygons"
object: non-unique Polygons ID slot values" .  I've tried a number of
different ways to change the ID in 'mergepolys' but haven't been successful
yet.

Ultimately, I just want to get the shapefiles into R so I can identify
contiguous zip codes for the spatial regression.  Whether I get this by
loading in one big shape zip code file or concatenating 48 state files is
irrelevant to me.  Perhaps the census shapefiles have superfluous data that
I can get rid of to free up memory and still achieve my objective, I don't
know enough about shapefiles and how R reads them to know what I can throw
away.  Maybe I'm going about this all wrong.  Thank you for any help and or
suggestions that you can provide.

After getting the shapefiles in I plan to identify contiguous zip codes and
use R2Winbugs to fit the model as outlined in "Applied Spatial Data Analysis
with R".  However, given the memory issues I'm having I am concerned that
forming the spatial weighting matrix won't be possible, will R try to store
this as an nxn matrix?  Furthermore, I have about 50+ other covariates that
I need to merge in with the zip code data that is going to take up memory as
well.  Simply put, is the memory bottleneck just in the function(s) loading
the shapefiles or am I going to have trouble fitting this model with the
covariates in R?

I've seen the thread "mapping by zip codes"
https://stat.ethz.ch/pipermail/r-sig-geo/2009-March/005194.html , which
provides very useful information but hasn't helped me get around the
problems I'm having.

I've tried to be complete yet concise.  If there is any other information
you need please let me know.

Thanks for any help and or suggestions you can provide.

-Ben


--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: [email protected]

_______________________________________________
R-sig-Geo mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Re: [R-sig-Geo] large shapefiles; zip code data

Reply via email to