Re: [R] Moran I for very large data set

2010-11-30 Thread Roger Bivand

First, 14000 is not a large data set, unless you are trying to create a dense
matrix, which will probably tax your computer, and is not necessary.

Second, you haven't indicated how you are doing this, by quoting the salient
parts of the code you are using - it may well be that your approach is
flawed, but nobody can see over your shoulder on the list. For instance, if
you are using dnearneigh() in spdep, and have set a maximum distance to
include all the observations, you will likely run out of memory (note that
the distance is in km). Just re-running a script is not a robust way to
proceed, you need to run it line by line to see where the bottleneck is. It
may be that projecting the data will solve your problem if it is the Great
Circle computations that are burdensome.

For me:

 set.seed(1)
 crds - cbind(runif(14000, 0, 10), runif(14000, 0, 10))
 k1 - knn2nb(knearneigh(crds, 1, longlat=TRUE))
 k1d - nbdists(k1, crds, longlat=TRUE)
 max(unlist(k1d))
[1] 18.54627
 system.time(dnb - dnearneigh(crds, 0, 18, longlat=TRUE))
   user  system elapsed 
 53.864   0.019  55.418 
 system.time(lw - nb2listw(dnb, zero.policy=TRUE))
   user  system elapsed 
  0.909   0.008   0.918 
 system.time(mt - moran.test(rnorm(14000), lw, zero.policy=TRUE))
   user  system elapsed 
  8.610   0.006   8.801 

with the R process using at most 140MB.

Third, you should consider using the R-sig-geo list, where a follow-up would
have been forthcoming more quickly.

Hope this helps,

Roger


Watmough G. wrote:
 
 Hi
 
 Are there any more efficient ways of calculating the neighbourhood object
 for large datasets?
 
 I am trying to compute Moran I statistics for a very large data set (over
 14,000 points).  I have been using moran.test from the spdep package and
 everything works fine for a small data set (200 points).  However,
 applying the same script to the whole dataset is taking days to compute
 (it so far has been going for 5 days and still no results).  This is no
 surprise due to the number of computations required.
 
 I have found that calculating distances planar distances works much
 quicker but Great Circle distances are required.
 
 Thanks
 
 Gary Watmough
 
 


-
Roger Bivand
Economic Geography Section
Department of Economics
Norwegian School of Economics and Business Administration
Helleveien 30
N-5045 Bergen, Norway

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Moran-I-for-very-large-data-set-tp3063474p3065310.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Moran I for very large data set

2010-11-30 Thread Mike Marchywka










 Date: Tue, 30 Nov 2010 04:51:10 -0800
 From: roger.biv...@nhh.no
 To: r-help@r-project.org
 Subject: Re: [R] Moran I for very large data set


 First, 14000 is not a large data set, unless you are trying to create a dense
 matrix, which will probably tax your computer, and is not necessary.

 Second, you haven't indicated how you are doing this, by quoting the salient
 parts of the code you are using - it may well be that your approach is
 flawed, but nobody can see over your shoulder on the list. For instance, if
 you are using dnearneigh() in spdep, and have set a maximum distance to
 include all the observations, you will likely run out of memory (note that
 the distance is in km). Just re-running a script is not a robust way to
 proceed, you need to run it line by line to see where the bottleneck is. It
 may be that projecting the data will solve your problem if it is the Great
 Circle computations that are burdensome.


I guess I'd add to the posting guidelines that if you are reporting a 
performance
issue, make a token effort to determine and post what resource is really
limiting your peformance ( CPU, page faults, IO etc). either that or fedex
us your machine... Oh 'dohs, task manager reported CPU usage will drop to
almost zero as it blocks for IO( page faults are IO) and on 'dohs 7
they seem to have greatly expanded task manager although you don't seem
to be able to reduce it to text for easy sharing. 






 For me:

  set.seed(1)
  crds - cbind(runif(14000, 0, 10), runif(14000, 0, 10))
  k1 - knn2nb(knearneigh(crds, 1, longlat=TRUE))
  k1d - nbdists(k1, crds, longlat=TRUE)
  max(unlist(k1d))
 [1] 18.54627
  system.time(dnb - dnearneigh(crds, 0, 18, longlat=TRUE))
 user system elapsed
 53.864 0.019 55.418
  system.time(lw - nb2listw(dnb, zero.policy=TRUE))
 user system elapsed
 0.909 0.008 0.918
  system.time(mt - moran.test(rnorm(14000), lw, zero.policy=TRUE))
 user system elapsed
 8.610 0.006 8.801

 with the R process using at most 140MB.

 Third, you should consider using the R-sig-geo list, where a follow-up would
 have been forthcoming more quickly.

 Hope this helps,

 Roger


 Watmough G. wrote:
 
  Hi
 
  Are there any more efficient ways of calculating the neighbourhood object
  for large datasets?
 
  I am trying to compute Moran I statistics for a very large data set (over
  14,000 points). I have been using moran.test from the spdep package and
  everything works fine for a small data set (200 points). However,
  applying the same script to the whole dataset is taking days to compute
  (it so far has been going for 5 days and still no results). This is no
  surprise due to the number of computations required.
 
  I have found that calculating distances planar distances works much
  quicker but Great Circle distances are required.
 
  Thanks
 
  Gary Watmough
 
 


 -
 Roger Bivand
 Economic Geography Section
 Department of Economics
 Norwegian School of Economics and Business Administration
 Helleveien 30
 N-5045 Bergen, Norway

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Moran-I-for-very-large-data-set-tp3063474p3065310.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Moran I for very large data set

2010-11-29 Thread Watmough G.
Hi

Are there any more efficient ways of calculating the neighbourhood object for 
large datasets?

I am trying to compute Moran I statistics for a very large data set (over 
14,000 points).  I have been using moran.test from the spdep package and 
everything works fine for a small data set (200 points).  However, applying the 
same script to the whole dataset is taking days to compute (it so far has been 
going for 5 days and still no results).  This is no surprise due to the number 
of computations required.

I have found that calculating distances planar distances works much quicker but 
Great Circle distances are required.

Thanks

Gary Watmough


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.