Hi Brent; One thing to remember is that finding the center of mass isn't necessarily a statistical problem - unless the points are a sample of some population.
If a point set contains 10 points, that's what it contains, and its center of mass has a physical meaning. Chris Hermansen [email protected] tel+1.604.714.2878 · fax+1.604.733.0631 · mob+1.778.840.4625 Timberline Natural Resource Group · www.timberline.ca 401 · 958 West 8th Avenue · Vancouver BC · Canada · V5Z 1E5 -----Original Message----- From: [email protected] Date: Wed, 16 Sep 2009 19:12:39 To: PostGIS Users Discussion<[email protected]> Subject: Re: [postgis-users] Center of Points Collection Hi guys, One concern I have is with the sample sizes being suggested here as "proof/disproof of concept". Pretty much any statistical sampling approach becomes more robust with increased sample size, so many cases with only a few points are likely to be anomalous. The other aspect that is not being addressed is the question of testing for significance. A mean (avg) of 10 with a SD of 10 suggests the mean is not that useful as a descriptive/representative statistic, a mean of 10 with an SD of 1 is probably reasonable. The avg() function in an RDBMS returns no measure of spread/fit, so is pretty easily misused on datasets where an avg() is not an appropriate statistic. Any statistical value representing a dataset without some measure of variance or fit is quite probably misleading, which is why RDBMS's are not the best tool for such measures, despite their convenience. For example, a polygon is defined as "the set of points contained by ..." but the centroid representing the polygon is often defined by the points delineating the perimeter, which is actually a linestring, not the polygon it contains. There are going to be many cases where this causes problems. One, relatively simple approach (as these things go :-) & somewhat more robust method for calculating a centroid is to generate a grid of points within the bbox of the polygon, toss out those not within the polygon itself, & take the avg(X) & avg(Y) of the remaining points. At least this provides a centroid based on the polygon, rather than the perimeter, but can also generate a centroid outside the polygon for complex or concave shapes. If we really want a statistically robust solution, a database (spatially enabled or not) is generally unlikely to be the ideal tool. Cheers, Brent Wood --- On Thu, 9/17/09, Simon Greener <[email protected]> wrote: > From: Simon Greener <[email protected]> > Subject: Re: [postgis-users] Center of Points Collection > To: "PostGIS Users Discussion" <[email protected]> > Date: Thursday, September 17, 2009, 11:24 AM > Chris, > > Never thought my method was theoretically correct: just had > a bit of fun working out how one would do it given > no database (except Oracle) has a MEDIAN aggregate cf AVG. > > I looked at your URL and decided I didn't have the time to > look at implementing it. > > regards > SImon > On Thu, 17 Sep 2009 02:08:47 +1000, Chris Hermansen > <[email protected]> > wrote: > > > A median in X will either yield a point in the set or > the average of the > > two points bracketing the (theoretical) median. > Same in Y. In 2D the > > medians of both will often produce a point that is not > in either set, > > even in cases where there is not an interpolation > going on. Also what > > does one do about duplicate coordinates? > > > > In the median there is clearly the concept of "half of > the points being > > on one side of the median, half on the other > side". It's this > > "sidedness" in 2D that makes me fret. > > > > Here is an example for which I can't conceive of a > great solution: > > > > POINT(1 6), > > POINT(0 5), > > POINT(0 4), > > POINT(2 3), > > POINT(2 2), > > POINT(2 1), > > POINT(1 0) > > > > What is the median of that? Simon, what your > propose (medians of x and > > y) should I think give POINT(1.5 3) if duplicates are > only counted once > > and POINT(1 3) if not. > > > > Simon Greener wrote: > >> What would happen if you calculated the median of > the x and y > >> ordinates instead of the average? > >> > >> WITH points As > >> ( > >> SELECT st_collect(column1) as the_geom > >> FROM (values ('POINT(0 0)'), > >> > ('POINT(0 1)'), > >> > ('POINT(0 2)'), > >> > ('POINT(1 0)')) as foo > >> ) > >> SELECT astext(st_makepoint(avg(x),avg(y))) > >> FROM ( SELECT st_x(geom) as x, > st_y(geom) as y > >> FROM > (select (ST_Dump(the_geom)).geom as geom from points > >> as foo) as p > >> ORDER > by st_x(geom) ASC, st_y(geom) ASC > >> LIMIT > (select case when mod(ST_NumGeometries(the_geom),2) = > >> 0 then 1 else 2 end from points) > >> > OFFSET (select case when > mod(ST_NumGeometries(the_geom),2) = > >> 0 then ST_NumGeometries(the_geom)/2-1 else > >> (ST_NumGeometries(the_geom)-1)/2 end from points) > >> ) T; > >> -- where p=1 and n=N/2-1 when the number of non > null rows N is even, > >> or p=2 and n=(N-1)/2 when N is odd. > >> > >> "POINT(0 1)" > >> > >> See > >> http://scorreiait.wordpress.com/2008/10/28/how-to-compute-a-median-in-sql > >> > >> S > >> On Wed, 16 Sep 2009 05:25:29 +1000, Chris > Hermansen > >> <[email protected]> > wrote: > >> > >>> Hmm on further thought I'm not completely > clear on the definition of > >>> "median" in two dimensions, either. I > found this interesting article on > >>> Google, including a way cool Java app for > calculating and graphing: > >>> > >>> http://www.tiac.net/~sw/2007/11/2D_median/index.html > >>> > >>> [email protected] > wrote: > >>>> Hi guys, > >>>> > >>>> A bit more difficult, & way out in > left field, but if you use PL/R > >>>> to create a median function for Postgres, > you could build your point > >>>> from the median(X) & (Y) values > instead of the average. > >>>> > >>>> Where this would actually lie obviously > depends on the distribution > >>>> of the points. The centroid is most > affected by (actually defined > >>>> by) outlying points, the avg somewhat less > & the median less still. > >>>> > >>>> Of course once you have PL/R to play with, > you have much more > >>>> flexibility to look at returning > statistics from datasets than just > >>>> the median. > >>>> > >>>> Cheers, > >>>> > >>>> Brent Wood > >>>> > >>>> > >>>> > >>>> > >>>> --- On Tue, 9/15/09, Kevin Neufeld <[email protected]> > wrote: > >>>> > >>>> > >>>>> From: Kevin Neufeld <[email protected]> > >>>>> Subject: Re: [postgis-users] Center of > Points Collection > >>>>> To: "PostGIS Users Discussion" <[email protected]> > >>>>> Date: Tuesday, September 15, 2009, > 8:51 AM > >>>>> Paul Ramsey wrote: > >>>>> > >>>>>> Faster than creating a multipoint > is to recognize that > >>>>>> > >>>>> ST_Centroid() > >>>>> > >>>>>> is just going to return the center > of the bbox of the > >>>>>> > >>>>> collection > >>>>> > >>>>>> anyways... > >>>>>> > >>>>> Unfortunately, Paul, ST_Centroid > returns the center of > >>>>> mass, not the center of the bbox. > >>>>> > >>>>> SELECT > astext(st_centroid(st_collect(column1))), > >>>>> FROM (values ('POINT(0 0)'), > >>>>> > >>>>> ('POINT(0 1)'), > >>>>> > >>>>> ('POINT(0 2)'), > >>>>> > >>>>> ('POINT(1 0)')) as foo; > >>>>> astext > >>>>> ------------------ > >>>>> POINT(0.25 0.75) > >>>>> (1 row) > >>>>> > >>>>> Your second post, taking the avg of > the x,y does seem to be > >>>>> the nice approach, and produces the > same results as > >>>>> ST_Centroid - the center of mass. > >>>>> > >>>>> SELECT > astext(st_makepoint(avg(st_x(column1)), > >>>>> avg(st_y(column1)))) > >>>>> FROM (values ('POINT(0 0)'), > >>>>> > >>>>> ('POINT(0 1)'), > >>>>> > >>>>> ('POINT(0 2)'), > >>>>> > >>>>> ('POINT(1 0)')) as foo; > >>>>> astext > >>>>> ------------------ > >>>>> POINT(0.25 0.75) > >>>>> (1 row) > >>>>> > >>>>> If Dustin is after the center of the > collection, then > >>>>> something along your first suggestion > might be more > >>>>> appropriate. > >>>>> (taking the center of the extents) > >>>>> > >>>>> Cheers, > >>>>> Kevin > >>>>> >_______________________________________________ > >>>>> postgis-users mailing list > >>>>> [email protected] > >>>>> http://postgis.refractions.net/mailman/listinfo/postgis-users > >>>>> > >>>>> > >>>> >_______________________________________________ > >>>> postgis-users mailing list > >>>> [email protected] > >>>> http://postgis.refractions.net/mailman/listinfo/postgis-users > >>>> > >>> > >>> > >> > >> > > > > > > > -- > SpatialDB Advice and Design, Solutions Architecture and > Programming, > Oracle Database 10g Administrator Certified Associate; > Oracle Database 10g SQL Certified Professional > Oracle Spatial, SQL Server, PostGIS, MySQL, ArcSDE, > Manifold GIS, FME, Radius Topology and Studio Specialist. > 39 Cliff View Drive, Allens Rivulet, 7150, Tasmania, > Australia. > Website: www.spatialdbadvisor.com > Email: [email protected] > Voice: +61 362 396397 > Mobile: +61 418 396391 > Skype: sggreener > Longitude: 147.20515 (147° 12' 18" E) > Latitude: -43.01530 (43° 00' 55" S) > NAC:W80CK 7SWP3 >_______________________________________________ > postgis-users mailing list > [email protected] > http://postgis.refractions.net/mailman/listinfo/postgis-users > _______________________________________________ postgis-users mailing list [email protected] http://postgis.refractions.net/mailman/listinfo/postgis-users _______________________________________________ postgis-users mailing list [email protected] http://postgis.refractions.net/mailman/listinfo/postgis-users
