Hi Chris, Agreed, but my point is that a simple mean is not always going to be a meaningful value, nor is a centroid, to represent a set of values. And an RDBMS does not easily tell you how representative your of your dataset the mean/centroid is.
Cheers, Brent --- On Thu, 9/17/09, Chris Hermansen <[email protected]> wrote: > From: Chris Hermansen <[email protected]> > Subject: Re: [postgis-users] Center of Points Collection > To: [email protected], "PostGIS Users Discussion" > <[email protected]> > Date: Thursday, September 17, 2009, 3:19 PM > Hi Brent; > > One thing to remember is that finding the center of mass > isn't necessarily a statistical problem - unless the points > are a sample of some population. > > If a point set contains 10 points, that's what it contains, > and its center of mass has a physical meaning. > Chris Hermansen [email protected] > tel+1.604.714.2878 · fax+1.604.733.0631 · > mob+1.778.840.4625 > Timberline Natural Resource Group · www.timberline.ca > 401 · 958 West 8th Avenue · Vancouver BC · Canada > · V5Z 1E5 > > > -----Original Message----- > From: [email protected] > > Date: Wed, 16 Sep 2009 19:12:39 > To: PostGIS Users Discussion<[email protected]> > Subject: Re: [postgis-users] Center of Points Collection > > > Hi guys, > > One concern I have is with the sample sizes being suggested > here as "proof/disproof of concept". > > Pretty much any statistical sampling approach becomes more > robust with increased sample size, so many cases with only a > few points are likely to be anomalous. > > The other aspect that is not being addressed is the > question of testing for significance. > > A mean (avg) of 10 with a SD of 10 suggests the mean is not > that useful as a descriptive/representative statistic, a > mean of 10 with an SD of 1 is probably reasonable. The avg() > function in an RDBMS returns no measure of spread/fit, so is > pretty easily misused on datasets where an avg() is not an > appropriate statistic. > > Any statistical value representing a dataset without some > measure of variance or fit is quite probably misleading, > which is why RDBMS's are not the best tool for such > measures, despite their convenience. > > For example, a polygon is defined as "the set of points > contained by ..." but the centroid representing the polygon > is often defined by the points delineating the perimeter, > which is actually a linestring, not the polygon it contains. > There are going to be many cases where this causes > problems. > > One, relatively simple approach (as these things go :-) > & somewhat more robust method for calculating a centroid > is to generate a grid of points within the bbox of the > polygon, toss out those not within the polygon itself, & > take the avg(X) & avg(Y) of the remaining points. At > least this provides a centroid based on the polygon, rather > than the perimeter, but can also generate a centroid outside > the polygon for complex or concave shapes. > > If we really want a statistically robust solution, a > database (spatially enabled or not) is generally unlikely to > be the ideal tool. > > Cheers, > > Brent Wood > > > --- On Thu, 9/17/09, Simon Greener <[email protected]> > wrote: > > > From: Simon Greener <[email protected]> > > Subject: Re: [postgis-users] Center of Points > Collection > > To: "PostGIS Users Discussion" <[email protected]> > > Date: Thursday, September 17, 2009, 11:24 AM > > Chris, > > > > Never thought my method was theoretically correct: > just had > > a bit of fun working out how one would do it given > > no database (except Oracle) has a MEDIAN aggregate cf > AVG. > > > > I looked at your URL and decided I didn't have the > time to > > look at implementing it. > > > > regards > > SImon > > On Thu, 17 Sep 2009 02:08:47 +1000, Chris Hermansen > <[email protected]> > > wrote: > > > > > A median in X will either yield a point in the > set or > > the average of the > > > two points bracketing the (theoretical) > median. > > Same in Y. In 2D the > > > medians of both will often produce a point that > is not > > in either set, > > > even in cases where there is not an > interpolation > > going on. Also what > > > does one do about duplicate coordinates? > > > > > > In the median there is clearly the concept of > "half of > > the points being > > > on one side of the median, half on the other > > side". It's this > > > "sidedness" in 2D that makes me fret. > > > > > > Here is an example for which I can't conceive of > a > > great solution: > > > > > > POINT(1 6), > > > POINT(0 5), > > > POINT(0 4), > > > POINT(2 3), > > > POINT(2 2), > > > POINT(2 1), > > > POINT(1 0) > > > > > > What is the median of that? Simon, what your > > propose (medians of x and > > > y) should I think give POINT(1.5 3) if duplicates > are > > only counted once > > > and POINT(1 3) if not. > > > > > > Simon Greener wrote: > > >> What would happen if you calculated the > median of > > the x and y > > >> ordinates instead of the average? > > >> > > >> WITH points As > > >> ( > > >> SELECT st_collect(column1) as the_geom > > >> FROM (values ('POINT(0 0)'), > > >> > > ('POINT(0 1)'), > > >> > > ('POINT(0 2)'), > > >> > > ('POINT(1 0)')) as foo > > >> ) > > >> SELECT astext(st_makepoint(avg(x),avg(y))) > > >> FROM ( SELECT st_x(geom) as x, > > st_y(geom) as y > > >> FROM > > (select (ST_Dump(the_geom)).geom as geom from points > > >> as foo) as p > > >> ORDER > > by st_x(geom) ASC, st_y(geom) ASC > > >> LIMIT > > (select case when mod(ST_NumGeometries(the_geom),2) = > > >> 0 then 1 else 2 end from points) > > >> > > OFFSET (select case when > > mod(ST_NumGeometries(the_geom),2) = > > >> 0 then ST_NumGeometries(the_geom)/2-1 else > > >> (ST_NumGeometries(the_geom)-1)/2 end from > points) > > >> ) T; > > >> -- where p=1 and n=N/2-1 when the number of > non > > null rows N is even, > > >> or p=2 and n=(N-1)/2 when N is odd. > > >> > > >> "POINT(0 1)" > > >> > > >> See > > >> http://scorreiait.wordpress.com/2008/10/28/how-to-compute-a-median-in-sql > > >> > > >> S > > >> On Wed, 16 Sep 2009 05:25:29 +1000, Chris > > Hermansen > > >> <[email protected]> > > wrote: > > >> > > >>> Hmm on further thought I'm not > completely > > clear on the definition of > > >>> "median" in two dimensions, either. I > > found this interesting article on > > >>> Google, including a way cool Java app > for > > calculating and graphing: > > >>> > > >>> http://www.tiac.net/~sw/2007/11/2D_median/index.html > > >>> > > >>> [email protected] > > wrote: > > >>>> Hi guys, > > >>>> > > >>>> A bit more difficult, & way out > in > > left field, but if you use PL/R > > >>>> to create a median function for > Postgres, > > you could build your point > > >>>> from the median(X) & (Y) values > > instead of the average. > > >>>> > > >>>> Where this would actually lie > obviously > > depends on the distribution > > >>>> of the points. The centroid is most > > affected by (actually defined > > >>>> by) outlying points, the avg somewhat > less > > & the median less still. > > >>>> > > >>>> Of course once you have PL/R to play > with, > > you have much more > > >>>> flexibility to look at returning > > statistics from datasets than just > > >>>> the median. > > >>>> > > >>>> Cheers, > > >>>> > > >>>> Brent Wood > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> --- On Tue, 9/15/09, Kevin Neufeld > <[email protected]> > > wrote: > > >>>> > > >>>> > > >>>>> From: Kevin Neufeld <[email protected]> > > >>>>> Subject: Re: [postgis-users] > Center of > > Points Collection > > >>>>> To: "PostGIS Users Discussion" > <[email protected]> > > >>>>> Date: Tuesday, September 15, > 2009, > > 8:51 AM > > >>>>> Paul Ramsey wrote: > > >>>>> > > >>>>>> Faster than creating a > multipoint > > is to recognize that > > >>>>>> > > >>>>> ST_Centroid() > > >>>>> > > >>>>>> is just going to return the > center > > of the bbox of the > > >>>>>> > > >>>>> collection > > >>>>> > > >>>>>> anyways... > > >>>>>> > > >>>>> Unfortunately, Paul, ST_Centroid > > returns the center of > > >>>>> mass, not the center of the > bbox. > > >>>>> > > >>>>> SELECT > > astext(st_centroid(st_collect(column1))), > > >>>>> FROM (values ('POINT(0 0)'), > > >>>>> > > >>>>> ('POINT(0 1)'), > > >>>>> > > >>>>> ('POINT(0 2)'), > > >>>>> > > >>>>> ('POINT(1 0)')) as foo; > > >>>>> astext > > >>>>> ------------------ > > >>>>> POINT(0.25 0.75) > > >>>>> (1 row) > > >>>>> > > >>>>> Your second post, taking the avg > of > > the x,y does seem to be > > >>>>> the nice approach, and produces > the > > same results as > > >>>>> ST_Centroid - the center of > mass. > > >>>>> > > >>>>> SELECT > > astext(st_makepoint(avg(st_x(column1)), > > >>>>> avg(st_y(column1)))) > > >>>>> FROM (values ('POINT(0 0)'), > > >>>>> > > >>>>> ('POINT(0 1)'), > > >>>>> > > >>>>> ('POINT(0 2)'), > > >>>>> > > >>>>> ('POINT(1 0)')) as foo; > > >>>>> astext > > >>>>> ------------------ > > >>>>> POINT(0.25 0.75) > > >>>>> (1 row) > > >>>>> > > >>>>> If Dustin is after the center of > the > > collection, then > > >>>>> something along your first > suggestion > > might be more > > >>>>> appropriate. > > >>>>> (taking the center of the > extents) > > >>>>> > > >>>>> Cheers, > > >>>>> Kevin > > >>>>> > >_______________________________________________ > > >>>>> postgis-users mailing list > > >>>>> [email protected] > > >>>>> http://postgis.refractions.net/mailman/listinfo/postgis-users > > >>>>> > > >>>>> > > >>>> > >_______________________________________________ > > >>>> postgis-users mailing list > > >>>> [email protected] > > >>>> http://postgis.refractions.net/mailman/listinfo/postgis-users > > >>>> > > >>> > > >>> > > >> > > >> > > > > > > > > > > > > -- > > SpatialDB Advice and Design, Solutions Architecture > and > > Programming, > > Oracle Database 10g Administrator Certified > Associate; > > Oracle Database 10g SQL Certified Professional > > Oracle Spatial, SQL Server, PostGIS, MySQL, ArcSDE, > > Manifold GIS, FME, Radius Topology and Studio > Specialist. > > 39 Cliff View Drive, Allens Rivulet, 7150, Tasmania, > > Australia. > > Website: www.spatialdbadvisor.com > > Email: [email protected] > > Voice: +61 362 396397 > > Mobile: +61 418 396391 > > Skype: sggreener > > Longitude: 147.20515 (147° 12' 18" E) > > Latitude: -43.01530 (43° 00' 55" S) > > NAC:W80CK 7SWP3 > >_______________________________________________ > > postgis-users mailing list > > [email protected] > > http://postgis.refractions.net/mailman/listinfo/postgis-users > > > _______________________________________________ > postgis-users mailing list > [email protected] > http://postgis.refractions.net/mailman/listinfo/postgis-users > _______________________________________________ postgis-users mailing list [email protected] http://postgis.refractions.net/mailman/listinfo/postgis-users
