Hi all,

Thanks to your ideas, I was able to successfully implement spatial clustering. 
I queried the data via Perl and DBD::Pg (Perl is my tool of choice for most 
things), and used Statistics::R to bridge it with R. KMeans is built in the 
standard R package. Ten lines of code, and it was all done.

 796         my $R = Statistics::R->new();
 797         $clusters = 100;
 798         $R->set( 'n.clusters', $clusters);
 799         $R->set( 'lon', \@lon);
 800         $R->set( 'lat', \@lat);
 801         $R->run(q`sample.data <- cbind(lon, lat)`);
 802         $R->run(q`cl <- kmeans(sample.data, n.clusters)`);
 803 
 804         my $lon = $R->get( 'as.numeric(cl$centers[,1])' );
 805         my $lat = $R->get( 'as.numeric(cl$centers[,2])' );
 806         my $size = $R->get( 'cl$size' );


Takes about 8 secs for 120K records, but I am sure I can make it a tad faster. 
That said, it doesn't really matter because once I calculate the clustering, I 
store it on disk using Storable. Subsequent calls are a couple of hundred 
milliseconds.


On Dec 6, 2011, at 1:58 AM, Phil James wrote:

> I would use something that is designed to do this analysis - R would be an 
> obvious choice but there are others - using Python you can connect to Postgis 
> to grab the data and then rpy to run R commands from python - we have used 
> this configuration successfully using Django and it works fast enough for the 
> web generating an image - this could of course be optimised to cache outputs 
> if performance is an issue.
> 
>>> ..
>> 
>> 
>> Reading the above made me realize that I should have rephrased my question 
>> -- I don't want to create images on the server side. I realize now that what 
>> I really want to do is to do spatial clustering on the server side and then 
>> send the data to the browser. I wrote my own very naive clustering routine 
>> in Perl, and also tried Algorithm::KMeans [1]. This kind of analysis allows 
>> me to create a summary of my data that I can then plot on the client (see 
>> image at [2]).
>> 
>> Of course, my algorithm is way too naive, and waaaaay too slow, although I 
>> *can* compute the summary and cache it using Storable.
>> 
>> So, here is my rephrased question -- I am looking to do spatial clustering 
>> on my Pg data. The added complication is that I do not have access to 
>> WKTRaster.
>> 
>> 
>> [1] 
>> http://search.cpan.org/~avikak/Algorithm-KMeans-1.30/lib/Algorithm/KMeans.pm
>> [2] http://dl.dropbox.com/u/3526821/occurrences.png
>> 




_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users

Reply via email to