[R-sig-Geo] Count occurrences less memory expensive than superimpose function in several spatial objects

Adrian Baddeley Sat, 22 Aug 2020 04:04:55 -0700

Alexandre Santos writes:

 > I'll like to read several shapefiles, count occurrences in the same
 > coordinate and create a final shapefile with a threshold number of
 > occurrences. I try to convert the shapefiles in ppp object (because I
 > have some part of my data set in shapefile and another in ppp objects)
 > and applied superimpose function [.... ]


The function 'superimpose' in the spatstat package is generic, with methods for 
'ppp' and 'default'.

Your example code applies 'superimpose' to a list of objects of class 'ppp'.
This uses the method 'superimpose.ppp' which applies to objects of class 'ppp'
and constructs a new object of class 'ppp'. This task includes computing the 
appropriate "observation window"
(a component of the 'ppp' structure) from the observation windows of the input 
patterns.
There is an option in 'superimpose.ppp' to specify the observation window of 
the result.
You didn't use this option, so you're expecting the function 'superimpose.ppp' 
to calculate the
appropriate window. When you have many objects with complicated windows, this 
will take a lot of time.

To make this go faster you could simply extract the (x,y) coordinates of the 
objects using coords() or as.data.frame().
Then call 'superimpose' on these data frames which will invoke 
superimpose.default which will concatenate the
(x,y) coordinate lists very quickly.

If I understand correctly, your ultimate goal is to have a list of the unique 
(x,y) points and their multiplicities.

If you have already superimposed (concatenated) the x, y coordinate lists, then 
you can calculate the multiplicities
with 'table' , or the spatstat function 'uniquemap' (the latter function is 
extremely fast)

However, you don't need to concatenate all the coordinates of all the point 
patterns before calculating multiplicities.
In big data applications it would be more efficient to process each point 
pattern dataset first,
determining the unique (x,y) points and their multiplicities within each point 
pattern,
and then to merge the results from the different point patterns. Something like 
this,
if 'Plist' is your list of point patterns:

       # process each point pattern
        Vlist <- lapply(unname(Plist),
        function(P) {
               xy <- as.data.frame(P)[,c("x","y")]
               um <- uniquemap(xy)
               isun <- (um == seq_along(um))
               mul <- table(um)
               return(cbind(xy[isun, , drop=FALSE], m=mul))
       })
       # concatenate results from all patterns
       V <- do.call(rbind, Vlist)
       # find unique points
       um <- uniquemap(V[,c("x","y")])
       isun <- (um == seq_along(um))
       U <- V[isun, c("x", "y")]
       m <- tapply(V$m, factor(um), sum)

Then U contains the unique locations and m is the multiplicities.




Prof Adrian Baddeley HonDSc FAA

John Curtin Distinguished Professor

School of Electrical Engineering, Computing and Mathematical Sciences

Curtin University, Perth, Western Australia


I work Wednesdays and Thursdays only

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

[R-sig-Geo] Count occurrences less memory expensive than superimpose function in several spatial objects

Reply via email to