Hey, I did the same thing using PLR and a minimal spanning tree. It was fast, easy, and clustering method could be changed at will (see all classical unsupervised learning).
Cheers, Rémi-C 2014-04-24 12:53 GMT+02:00 Paragon Corporation <[email protected]>: > Quick glance at your plpgsql function. It won't be able to use a > spatial index. To utilize a spatial index, you'll need to change your > construct > > ST_Distance(a.geom, b.geom) <= 4891.96981025 * 40 > > to: > ST_DWithin(a.geom, b.geom, 4891.96981025 * 40) > > Also make sure you have spatial indexes on your points table. > > Hope that helps, > Regina > http://www.postgis.us > http://postgis.net > > > > ------------------------------ > *From:* [email protected] [mailto: > [email protected]] *On Behalf Of *Matthias Ludwig > *Sent:* Wednesday, April 23, 2014 10:53 AM > *To:* [email protected] > *Subject:* [postgis-users] Distance based clustering (like in Openlayers > orLeaflet) > > Hi, > I try to implement a simple distance based clustering in PostgreSQL. > It's working the following way: > 1. loop as long as there are non clustered points > 2. took one point that is not clustered > 3. define this as a new cluster > 4. get all surrounding points in a defined distance and add them into > the cluster > 5. calculate the centroid for each cluster and count cluster points > > The calculation speed depends mostly on the count of points and the scale > used. In the moment i'am using a python script and it took around 1sec for > 1000 points for TMS zoomlevel 5 (distance = 4891.96981025m * 40 pixel) and > around 3sec at zoomlevel 15 (distance = 4.78m * 40 pixel). Implementing > it with pgsql doesn't speeds it up. > Is it possible to do the work completely with SQL and maybe with just one > query? I'am not sure if I understood the WITH RECURSIVE concept > completely...but shouldn't it by possible to replace the iteration(2.-4.) > with a WITH RECURSIVE statement? Are there any other suggestion to speed it > up or a better solution? > > Greetings > > > CREATE TABLE points > ( > id_point serial PRIMARY KEY, > geom geometry(Point,3857), > cluster_nr integer > ); > INSERT INTO points (geom) > SELECT ST_SetSrid(ST_MakePoint(12.5 + random()*5, 51 + random()*5), 3857) > FROM generate_series(0, 1000); > > CREATE TABLE pg_clusters > ( > gid integer, > cluster_feature_ids integer[], > count integer, > geom geometry > ); > > -- select first point, which isn't clustered yet > SELECT id_point > FROM points > WHERE cluster_nr IS NULL > LIMIT 1; > -- use first point and get all surrounding points in defined distance > including the point itself > -- use them as first cluster > WITH first_cluster AS ( > SELECT a.id_point AS cluster_nr, > b.id_point, > b.geom AS geom > FROM points a, > points b > WHERE ST_Distance(a.geom, b.geom) <= 4891.96981025 * 40 > AND a.id_point = 185 > AND a.cluster_nr IS NULL > ) > UPDATE points AS a > SET cluster_nr = b.cluster_nr > FROM first_cluster AS b > WHERE a.id_point = b.id_point; > --> choose next id > --> calculate next cluster > etc. > -- calculate the centroid of a cluster and count cluster points > DELETE FROM pg_clusters; > WITH clusters AS ( > SELECT cluster_nr, array_agg(id_point) AS feature_ids, > count(id_point), ST_Centroid(ST_Collect(geom)) AS geom > FROM points > GROUP BY cluster_nr > ) > INSERT INTO pg_clusters(gid, cluster_feature_ids, count, geom) > SELECT * FROM clusters; > > _______________________________________________ > postgis-users mailing list > [email protected] > http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users >
_______________________________________________ postgis-users mailing list [email protected] http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
