Re: [R] Moran I for very large data set
First, 14000 is not a large data set, unless you are trying to create a dense matrix, which will probably tax your computer, and is not necessary. Second, you haven't indicated how you are doing this, by quoting the salient parts of the code you are using - it may well be that your approach is flawed, but nobody can see over your shoulder on the list. For instance, if you are using dnearneigh() in spdep, and have set a maximum distance to include all the observations, you will likely run out of memory (note that the distance is in km). Just re-running a script is not a robust way to proceed, you need to run it line by line to see where the bottleneck is. It may be that projecting the data will solve your problem if it is the Great Circle computations that are burdensome. For me: set.seed(1) crds - cbind(runif(14000, 0, 10), runif(14000, 0, 10)) k1 - knn2nb(knearneigh(crds, 1, longlat=TRUE)) k1d - nbdists(k1, crds, longlat=TRUE) max(unlist(k1d)) [1] 18.54627 system.time(dnb - dnearneigh(crds, 0, 18, longlat=TRUE)) user system elapsed 53.864 0.019 55.418 system.time(lw - nb2listw(dnb, zero.policy=TRUE)) user system elapsed 0.909 0.008 0.918 system.time(mt - moran.test(rnorm(14000), lw, zero.policy=TRUE)) user system elapsed 8.610 0.006 8.801 with the R process using at most 140MB. Third, you should consider using the R-sig-geo list, where a follow-up would have been forthcoming more quickly. Hope this helps, Roger Watmough G. wrote: Hi Are there any more efficient ways of calculating the neighbourhood object for large datasets? I am trying to compute Moran I statistics for a very large data set (over 14,000 points). I have been using moran.test from the spdep package and everything works fine for a small data set (200 points). However, applying the same script to the whole dataset is taking days to compute (it so far has been going for 5 days and still no results). This is no surprise due to the number of computations required. I have found that calculating distances planar distances works much quicker but Great Circle distances are required. Thanks Gary Watmough - Roger Bivand Economic Geography Section Department of Economics Norwegian School of Economics and Business Administration Helleveien 30 N-5045 Bergen, Norway -- View this message in context: http://r.789695.n4.nabble.com/Moran-I-for-very-large-data-set-tp3063474p3065310.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Moran I for very large data set
Date: Tue, 30 Nov 2010 04:51:10 -0800 From: roger.biv...@nhh.no To: r-help@r-project.org Subject: Re: [R] Moran I for very large data set First, 14000 is not a large data set, unless you are trying to create a dense matrix, which will probably tax your computer, and is not necessary. Second, you haven't indicated how you are doing this, by quoting the salient parts of the code you are using - it may well be that your approach is flawed, but nobody can see over your shoulder on the list. For instance, if you are using dnearneigh() in spdep, and have set a maximum distance to include all the observations, you will likely run out of memory (note that the distance is in km). Just re-running a script is not a robust way to proceed, you need to run it line by line to see where the bottleneck is. It may be that projecting the data will solve your problem if it is the Great Circle computations that are burdensome. I guess I'd add to the posting guidelines that if you are reporting a performance issue, make a token effort to determine and post what resource is really limiting your peformance ( CPU, page faults, IO etc). either that or fedex us your machine... Oh 'dohs, task manager reported CPU usage will drop to almost zero as it blocks for IO( page faults are IO) and on 'dohs 7 they seem to have greatly expanded task manager although you don't seem to be able to reduce it to text for easy sharing. For me: set.seed(1) crds - cbind(runif(14000, 0, 10), runif(14000, 0, 10)) k1 - knn2nb(knearneigh(crds, 1, longlat=TRUE)) k1d - nbdists(k1, crds, longlat=TRUE) max(unlist(k1d)) [1] 18.54627 system.time(dnb - dnearneigh(crds, 0, 18, longlat=TRUE)) user system elapsed 53.864 0.019 55.418 system.time(lw - nb2listw(dnb, zero.policy=TRUE)) user system elapsed 0.909 0.008 0.918 system.time(mt - moran.test(rnorm(14000), lw, zero.policy=TRUE)) user system elapsed 8.610 0.006 8.801 with the R process using at most 140MB. Third, you should consider using the R-sig-geo list, where a follow-up would have been forthcoming more quickly. Hope this helps, Roger Watmough G. wrote: Hi Are there any more efficient ways of calculating the neighbourhood object for large datasets? I am trying to compute Moran I statistics for a very large data set (over 14,000 points). I have been using moran.test from the spdep package and everything works fine for a small data set (200 points). However, applying the same script to the whole dataset is taking days to compute (it so far has been going for 5 days and still no results). This is no surprise due to the number of computations required. I have found that calculating distances planar distances works much quicker but Great Circle distances are required. Thanks Gary Watmough - Roger Bivand Economic Geography Section Department of Economics Norwegian School of Economics and Business Administration Helleveien 30 N-5045 Bergen, Norway -- View this message in context: http://r.789695.n4.nabble.com/Moran-I-for-very-large-data-set-tp3063474p3065310.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Moran I for very large data set
Hi Are there any more efficient ways of calculating the neighbourhood object for large datasets? I am trying to compute Moran I statistics for a very large data set (over 14,000 points). I have been using moran.test from the spdep package and everything works fine for a small data set (200 points). However, applying the same script to the whole dataset is taking days to compute (it so far has been going for 5 days and still no results). This is no surprise due to the number of computations required. I have found that calculating distances planar distances works much quicker but Great Circle distances are required. Thanks Gary Watmough [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.