Re: [GRASS-dev] How to calculate mean coordinates from big point datasets?

2013-09-25 Thread Markus Metz
Glynn Clements wrote:

 Markus Metz wrote:

 Please try the new addon v.centerpoint [0]. It calculates various
 center points for point clouds, lines, and areas. Standard options are
 the geometric mean (center of gravity)

 That's the arithmetic mean.

 The geometric mean of a set of N values is the Nth root of the product
 of the values (i.e. the logarithm of the geometric mean is the
 arithmetic mean of the logarithms of the values). It wouldn't be
 meaningful to compute the geometric mean of coordinate data.

Right, I fixed the documentation. Internally, the arithmetic mean has
been used already to calculate centers of gravity.

Markus M
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] How to calculate mean coordinates from big point datasets?

2013-09-24 Thread Glynn Clements

Markus Metz wrote:

 Please try the new addon v.centerpoint [0]. It calculates various
 center points for point clouds, lines, and areas. Standard options are
 the geometric mean (center of gravity)

That's the arithmetic mean.

The geometric mean of a set of N values is the Nth root of the product
of the values (i.e. the logarithm of the geometric mean is the
arithmetic mean of the logarithms of the values). It wouldn't be
meaningful to compute the geometric mean of coordinate data.

-- 
Glynn Clements gl...@gclements.plus.com
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] How to calculate mean coordinates from big point datasets?

2013-09-23 Thread Hamish
Hamish wrote:

 see also the v.points.cog addons script:
    http://grasswiki.osgeo.org/wiki/AddOns/GRASS_6#v.points.cog
 
 although I haven't tried it for anything as big as lidar data.

oops, I completely forgot to mention the r.cog addon script too:

  https://trac.osgeo.org/grass/browser/grass-addons/grass6/raster/r.cog

The main goal isn't just center-of-gravity, it's also to reduce the
dataset to a single trend plane. In that way you might subtract out the
principal component of the z-surface to show the underlying signal
(or bring the primary trend into focus). This is e.g. a common task in
working with magnetic data where the Earth's field dominates, but
is relatively smooth compared to the local signal.

A good sample data for it is spearfish's elevation rasters since there
is a strong N-S trend, and for categorical features using the spearfish
fields map + choosing different combinations of categories to move the
center-point around.

I suspect that it could be quite effective with lidar if used with
r.in.{xyz,lidar}, and would be interested to see how the results compare
numerically with MMetz's new module, especially if the 10% trimmed-mean
binning method was used to solve the mean vs. median outliers problem.
(for really big datasets that method has a non-trivial RAM cost though)


r.cog is not listed on the wiki yet since the dip angle and azimuth +
r.plane trend map is still a work in progress -- help welcome, I've
got the 3D pivot point and azimuth seemingly ok, but the mean slope
has proven tricky. My last attempt at it was to split the raster into
three equal-area triangular zones (sort of looking like the South African
flag) then do the COG for each (yes, a recursive script :) to get 3 3D
points, then fit a plane formula to them. IIRC the trouble I was having
with that was if there were large areas of NULL on one side it threw
the whole thing off.


see the example in the v.points.cog help page for an idea of what its
original target application was: placing labels in just the right spot,
while r.cog was written mainly with the idea to determine/create trend-
surfaces.


regards,
Hamish

___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] How to calculate mean coordinates from big point datasets?

2013-09-23 Thread Hamish
Moritz:

 Would be interesting to see results for big data. And AFAIK median is a 
 bit more difficult to do in awk. I imagine that replacing the median by 
 the mean in numpy is no problem (might be a flag to add to v.median).

see the old r.univar.sh shell script for an idea of how to get a median
using unix power tools, it's fairly trivial.

https://trac.osgeo.org/grass/browser/grass/branches/develbranch_6/scripts/r.univar.sh/r.univar.sh#L135


Hamish

___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] How to calculate mean coordinates from big point datasets?

2013-09-23 Thread Moritz Lennert

On 22/09/13 17:06, Markus Metz wrote:

Markus Neteler wrote:

Hi,

I came across this question:

http://gis.stackexchange.com/questions/71734/how-to-calculate-mean-coordinates-from-big-point-datasets


Please try the new addon v.centerpoint [0]. It calculates various
center points for point clouds, lines, and areas. Standard options are
the geometric mean (center of gravity) and geometric median (more
robust for outliers). There are additional options to calculate center
points for points, lines and areas, explained in the manual. The
geometric medians (points of minimum distance to all other points) are
approximations, but fairly accurate and very fast. I would be
surprised if any other GIS software can calculate these center points.


Thanks a lot ! This is a very nice addition and should be part of core 
grass, IMHO.


Moritz
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] How to calculate mean coordinates from big point datasets?

2013-09-22 Thread Markus Metz
Markus Neteler wrote:
 Hi,

 I came across this question:

 http://gis.stackexchange.com/questions/71734/how-to-calculate-mean-coordinates-from-big-point-datasets

Please try the new addon v.centerpoint [0]. It calculates various
center points for point clouds, lines, and areas. Standard options are
the geometric mean (center of gravity) and geometric median (more
robust for outliers). There are additional options to calculate center
points for points, lines and areas, explained in the manual. The
geometric medians (points of minimum distance to all other points) are
approximations, but fairly accurate and very fast. I would be
surprised if any other GIS software can calculate these center points.

Markus M

[0] http://grasswiki.osgeo.org/wiki/AddOns/GRASS7/vector#v.centerpoint
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] How to calculate mean coordinates from big point datasets?

2013-09-20 Thread Markus Metz
Glynn Clements wrote:

 Luca Delucchi wrote:

 maybe v.median [0] could help?

 Not for large datasets. First, it requires that the data will fit into
 RAM. Second, numpy.median() sorts the entire array and takes the
 middle value, which is somewhere between O(n.log(n)) for the typical
 case and O(n^2) for the worst case (numpy.median uses the default
 sorting algorithm, which is quicksort).

 See r.quantile for a more efficient approach for large datasets. The
 first pass computes a histogram which allows upper and lower bounds to
 be determined for each quantile. The second pass extracts values which
 lie within those bounds and sorts them.

The approach of v.median and r.quantile regards each dimension
separately which is IMHO not correct in a spatial context. The median
of a point cloud would be a multivariate median for 2 or 3 dimensions.
You would need to sort the points first by the first dimension, then
by the second dimension etc, then pick the coordinates of the mid
point. Alternatively you would need to find the point with the
smallest distance to all other points, which is nightmare to calculate
( (n - 1) x (n - 1) distance calculations).

 Except for pathological cases
 (where the majority of the data lie within a tiny proportion of the
 overall range), only a small fraction of the data are sorted.

Coordinates of large point clouds can easily be such pathological cases.


 In any case: is this question about the mean or the median?

I guess the median could make sense to account for outliers.

A v.centerpoint module could thus have options for point mode (mean,
median, distance) and line mode (mid point, center of gravity). Area
centroids are already calculated by v.category type=area, even though
they are not centers of gravity.

Markus M
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] How to calculate mean coordinates from big point datasets?

2013-09-20 Thread Markus Metz
On Fri, Sep 20, 2013 at 5:38 PM, Markus Metz
markus.metz.gisw...@gmail.com wrote:
 Glynn Clements wrote:

 Luca Delucchi wrote:

 maybe v.median [0] could help?

 Not for large datasets. First, it requires that the data will fit into
 RAM. Second, numpy.median() sorts the entire array and takes the
 middle value, which is somewhere between O(n.log(n)) for the typical
 case and O(n^2) for the worst case (numpy.median uses the default
 sorting algorithm, which is quicksort).

 See r.quantile for a more efficient approach for large datasets. The
 first pass computes a histogram which allows upper and lower bounds to
 be determined for each quantile. The second pass extracts values which
 lie within those bounds and sorts them.

 The approach of v.median and r.quantile regards each dimension
 separately which is IMHO not correct in a spatial context.

This

 The median
 of a point cloud would be a multivariate median for 2 or 3 dimensions.
 You would need to sort the points first by the first dimension, then
 by the second dimension etc, then pick the coordinates of the mid
 point.

is wrong, please ignore.

This

 Alternatively you would need to find the point with the
 smallest distance to all other points, which is nightmare to calculate
 ( (n - 1) x (n - 1) distance calculations).

is correct, but instead of finding the existing point with the
smallest distance, that point can be approximated with less effort.

Markus M
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] How to calculate mean coordinates from big point datasets?

2013-09-19 Thread Moritz Lennert

On 18/09/13 16:24, Markus Metz wrote:

On Wed, Sep 18, 2013 at 11:41 AM, Moritz Lennert
mlenn...@club.worldonline.be  wrote:

On 18/09/13 10:51, Luca Delucchi wrote:


On 17 September 2013 22:10, Markus Netelernete...@osgeo.org   wrote:


Hi,

I came across this question:


http://gis.stackexchange.com/questions/71734/how-to-calculate-mean-coordinates-from-big-point-datasets

and wondered if this approach would be the fasted:

# http://grass.osgeo.org/sampledata/north_carolina/points.las
v.in.lidar input=points.las output=lidarpoints -o
...
Number of points: 1287775
...

Now I would use
v.univar -d lidarpoints type=point

(still calculating here...)

Is it the best way?



maybe v.median [0] could help?


[1]
http://trac.osgeo.org/grass/browser/grass-addons/grass7/vector/v.median



Right.

Here's a little test:

$time v.median in=elev_lid792_randpts
638648.50|220378.50


Should be 638648|220378. It seems that numpy gets the median wrong...


When I look at the numbers coming out of v.to.db, there are a series of 
points at X=638648,5 around the 3000 limit, and a series of points at 
Y=220378,5, so I do believe that numpy is right.







real0m0.249s
user0m0.180s
sys 0m0.044s

$time v.to.db elev_lid792_randpts op=coor -p | awk -F'|' 'BEGIN{SUMX=0;
SUMY=0; N=0} {N+=1;SUMX+=$2;SUMY+=$3} END{print SUMX/N, SUMY/N}'
Reading features...
  100%
638544 220339


Should be 638650 220376


Oops, forgot the header line, so N goes up to 6001, instead of 6000.





real0m0.106s
user0m0.100s
sys 0m0.020s

Would be interesting to see results for big data. And AFAIK median is a bit
more difficult to do in awk. I imagine that replacing the median by the mean
in numpy is no problem (might be a flag to add to v.median).


The v.to.db + v.db.univar approach is working just fine, and provides
correct results.


Yes, but it seems a bit overkill to have to go through the attribute 
table to make such a calculation.




About a little module to calculate centroid of polygon, center point
of line and centroid (possibly weighted) for points, that would be
easy because all the components are there, even though there are in
theory alternatives to the current calculation of centroids for
polygons.


And these alternatives are better ?

Mortiz


___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] How to calculate mean coordinates from big point datasets?

2013-09-19 Thread Markus Metz
On Thu, Sep 19, 2013 at 9:15 AM, Moritz Lennert
mlenn...@club.worldonline.be wrote:
 On 18/09/13 16:24, Markus Metz wrote:

 Moritz Lennert  wrote:

 Here's a little test:

 $time v.median in=elev_lid792_randpts
 638648.50|220378.50


 Should be 638648|220378. It seems that numpy gets the median wrong...


 When I look at the numbers coming out of v.to.db, there are a series of
 points at X=638648,5 around the 3000 limit, and a series of points at
 Y=220378,5, so I do believe that numpy is right.

Right, db.univar was printing out with insufficient precision, fixed in r57750.


 About a little module to calculate centroid of polygon, center point
 of line and centroid (possibly weighted) for points, that would be
 easy because all the components are there, even though there are in
 theory alternatives to the current calculation of centroids for
 polygons.


 And these alternatives are better ?

I am not sure. For areas without isles, there is a faster alternative.
For areas with isles, the current approach is fast, but the centroids
might be placed at somewhat unexpected locations (the next best
location inside the area, outside its isles). Here, alternatives would
be much slower.

Markus M
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] How to calculate mean coordinates from big point datasets?

2013-09-19 Thread Markus Neteler
On Thu, Sep 19, 2013 at 2:32 PM, Markus Metz
markus.metz.gisw...@gmail.com wrote:
...
 I am not sure. For areas without isles, there is a faster alternative.
 For areas with isles, the current approach is fast, but the centroids
 might be placed at somewhat unexpected locations (the next best
 location inside the area, outside its isles). Here, alternatives would
 be much slower.

the original exercise was about LiDAR points only... perhaps that
needs to be treated differently.

markusN
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] How to calculate mean coordinates from big point datasets?

2013-09-19 Thread Glynn Clements

Luca Delucchi wrote:

 maybe v.median [0] could help?

Not for large datasets. First, it requires that the data will fit into
RAM. Second, numpy.median() sorts the entire array and takes the
middle value, which is somewhere between O(n.log(n)) for the typical
case and O(n^2) for the worst case (numpy.median uses the default
sorting algorithm, which is quicksort).

See r.quantile for a more efficient approach for large datasets. The
first pass computes a histogram which allows upper and lower bounds to
be determined for each quantile. The second pass extracts values which
lie within those bounds and sorts them. Except for pathological cases
(where the majority of the data lie within a tiny proportion of the
overall range), only a small fraction of the data are sorted.

In any case: is this question about the mean or the median?
Calculating the mean is far simpler, and can easily be done in O(n)
time and O(1) space.

-- 
Glynn Clements gl...@gclements.plus.com
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] How to calculate mean coordinates from big point datasets?

2013-09-18 Thread Moritz Lennert

On 18/09/13 00:10, Markus Neteler wrote:

Hi,

I came across this question:

http://gis.stackexchange.com/questions/71734/how-to-calculate-mean-coordinates-from-big-point-datasets

and wondered if this approach would be the fasted:

# http://grass.osgeo.org/sampledata/north_carolina/points.las
v.in.lidar input=points.las output=lidarpoints -o
...
Number of points: 1287775
...

Now I would use
v.univar -d lidarpoints type=point


This calculates geometry distances, not the mean coordinate:

$ v.univar -d myhosp type=point

number of primitives: 160
number of non zero distances: 12561
number of zero distances: 0
minimum: 9.16773
maximum: 760776
range: 760767
sum: 2.69047e+09
mean: 214193
mean of absolute values: 214193
population standard deviation: 128505
population variance: 1.65136e+10
population coefficient of variation: 0.599953
sample standard deviation: 128511
sample variance: 1.6515e+10
kurtosis: 0.277564
skewness: 0.801646




Is it the best way?


How about v.to.db -p op=coor and then calculating the mean of the 
coordinates with an ad-hoc script. But that's probably not any faster 
than Hamish' v.out.ascii approach in v.points.cog.


Would probably be a nice little module to have in C: calculate centroid 
of polygon, center point of line and centroid (possibly weighted) for 
points.


Moritz

___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] How to calculate mean coordinates from big point datasets?

2013-09-18 Thread Luca Delucchi
On 17 September 2013 22:10, Markus Neteler nete...@osgeo.org wrote:
 Hi,

 I came across this question:

 http://gis.stackexchange.com/questions/71734/how-to-calculate-mean-coordinates-from-big-point-datasets

 and wondered if this approach would be the fasted:

 # http://grass.osgeo.org/sampledata/north_carolina/points.las
 v.in.lidar input=points.las output=lidarpoints -o
 ...
 Number of points: 1287775
 ...

 Now I would use
 v.univar -d lidarpoints type=point

 (still calculating here...)

 Is it the best way?


maybe v.median [0] could help?

 Markus

[1] http://trac.osgeo.org/grass/browser/grass-addons/grass7/vector/v.median

-- 
ciao
Luca

http://gis.cri.fmach.it/delucchi/
www.lucadelu.org
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] How to calculate mean coordinates from big point datasets?

2013-09-18 Thread Moritz Lennert

On 18/09/13 10:51, Luca Delucchi wrote:

On 17 September 2013 22:10, Markus Netelernete...@osgeo.org  wrote:

Hi,

I came across this question:

http://gis.stackexchange.com/questions/71734/how-to-calculate-mean-coordinates-from-big-point-datasets

and wondered if this approach would be the fasted:

# http://grass.osgeo.org/sampledata/north_carolina/points.las
v.in.lidar input=points.las output=lidarpoints -o
...
Number of points: 1287775
...

Now I would use
v.univar -d lidarpoints type=point

(still calculating here...)

Is it the best way?



maybe v.median [0] could help?


[1] http://trac.osgeo.org/grass/browser/grass-addons/grass7/vector/v.median


Right.

Here's a little test:

$time v.median in=elev_lid792_randpts
638648.50|220378.50

real0m0.249s
user0m0.180s
sys 0m0.044s

$time v.to.db elev_lid792_randpts op=coor -p | awk -F'|' 'BEGIN{SUMX=0; 
SUMY=0; N=0} {N+=1;SUMX+=$2;SUMY+=$3} END{print SUMX/N, SUMY/N}'

Reading features...
 100%
638544 220339

real0m0.106s
user0m0.100s
sys 0m0.020s

Would be interesting to see results for big data. And AFAIK median is a 
bit more difficult to do in awk. I imagine that replacing the median by 
the mean in numpy is no problem (might be a flag to add to v.median).



I didn't try v.points.cog as that actually creates a new vector map.

Moritz

___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] How to calculate mean coordinates from big point datasets?

2013-09-18 Thread Markus Metz
On Wed, Sep 18, 2013 at 11:41 AM, Moritz Lennert
mlenn...@club.worldonline.be wrote:
 On 18/09/13 10:51, Luca Delucchi wrote:

 On 17 September 2013 22:10, Markus Netelernete...@osgeo.org  wrote:

 Hi,

 I came across this question:


 http://gis.stackexchange.com/questions/71734/how-to-calculate-mean-coordinates-from-big-point-datasets

 and wondered if this approach would be the fasted:

 # http://grass.osgeo.org/sampledata/north_carolina/points.las
 v.in.lidar input=points.las output=lidarpoints -o
 ...
 Number of points: 1287775
 ...

 Now I would use
 v.univar -d lidarpoints type=point

 (still calculating here...)

 Is it the best way?


 maybe v.median [0] could help?


 [1]
 http://trac.osgeo.org/grass/browser/grass-addons/grass7/vector/v.median


 Right.

 Here's a little test:

 $time v.median in=elev_lid792_randpts
 638648.50|220378.50

Should be 638648|220378. It seems that numpy gets the median wrong...


 real0m0.249s
 user0m0.180s
 sys 0m0.044s

 $time v.to.db elev_lid792_randpts op=coor -p | awk -F'|' 'BEGIN{SUMX=0;
 SUMY=0; N=0} {N+=1;SUMX+=$2;SUMY+=$3} END{print SUMX/N, SUMY/N}'
 Reading features...
  100%
 638544 220339

Should be 638650 220376


 real0m0.106s
 user0m0.100s
 sys 0m0.020s

 Would be interesting to see results for big data. And AFAIK median is a bit
 more difficult to do in awk. I imagine that replacing the median by the mean
 in numpy is no problem (might be a flag to add to v.median).

The v.to.db + v.db.univar approach is working just fine, and provides
correct results.

About a little module to calculate centroid of polygon, center point
of line and centroid (possibly weighted) for points, that would be
easy because all the components are there, even though there are in
theory alternatives to the current calculation of centroids for
polygons.

Markus M
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


[GRASS-dev] How to calculate mean coordinates from big point datasets?

2013-09-17 Thread Markus Neteler
Hi,

I came across this question:

http://gis.stackexchange.com/questions/71734/how-to-calculate-mean-coordinates-from-big-point-datasets

and wondered if this approach would be the fasted:

# http://grass.osgeo.org/sampledata/north_carolina/points.las
v.in.lidar input=points.las output=lidarpoints -o
...
Number of points: 1287775
...

Now I would use
v.univar -d lidarpoints type=point

(still calculating here...)

Is it the best way?

Markus
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] How to calculate mean coordinates from big point datasets?

2013-09-17 Thread Hamish
Markus Neteler wrote:

 I came across this question:
 
 http://gis.stackexchange.com/questions/71734/how-to-calculate-mean-coordinates-from-big-point-datasets

so wants to find the average coordinate?

 and wondered if this approach would be the fasted:
 
 # http://grass.osgeo.org/sampledata/north_carolina/points.las
 v.in.lidar input=points.las output=lidarpoints -o
 ...
 Number of points: 1287775
 ...
 
 Now I would use
 v.univar -d lidarpoints type=point
 
 (still calculating here...)
 
 Is it the best way?


see also the v.points.cog addons script:
   http://grasswiki.osgeo.org/wiki/AddOns/GRASS_6#v.points.cog

although I haven't tried it for anything as big as lidar data.


Hamish

___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev