Re: [GRASS-dev] How to calculate mean coordinates from big point datasets?
Glynn Clements wrote: Markus Metz wrote: Please try the new addon v.centerpoint [0]. It calculates various center points for point clouds, lines, and areas. Standard options are the geometric mean (center of gravity) That's the arithmetic mean. The geometric mean of a set of N values is the Nth root of the product of the values (i.e. the logarithm of the geometric mean is the arithmetic mean of the logarithms of the values). It wouldn't be meaningful to compute the geometric mean of coordinate data. Right, I fixed the documentation. Internally, the arithmetic mean has been used already to calculate centers of gravity. Markus M ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] How to calculate mean coordinates from big point datasets?
Markus Metz wrote: Please try the new addon v.centerpoint [0]. It calculates various center points for point clouds, lines, and areas. Standard options are the geometric mean (center of gravity) That's the arithmetic mean. The geometric mean of a set of N values is the Nth root of the product of the values (i.e. the logarithm of the geometric mean is the arithmetic mean of the logarithms of the values). It wouldn't be meaningful to compute the geometric mean of coordinate data. -- Glynn Clements gl...@gclements.plus.com ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] How to calculate mean coordinates from big point datasets?
Hamish wrote: see also the v.points.cog addons script: http://grasswiki.osgeo.org/wiki/AddOns/GRASS_6#v.points.cog although I haven't tried it for anything as big as lidar data. oops, I completely forgot to mention the r.cog addon script too: https://trac.osgeo.org/grass/browser/grass-addons/grass6/raster/r.cog The main goal isn't just center-of-gravity, it's also to reduce the dataset to a single trend plane. In that way you might subtract out the principal component of the z-surface to show the underlying signal (or bring the primary trend into focus). This is e.g. a common task in working with magnetic data where the Earth's field dominates, but is relatively smooth compared to the local signal. A good sample data for it is spearfish's elevation rasters since there is a strong N-S trend, and for categorical features using the spearfish fields map + choosing different combinations of categories to move the center-point around. I suspect that it could be quite effective with lidar if used with r.in.{xyz,lidar}, and would be interested to see how the results compare numerically with MMetz's new module, especially if the 10% trimmed-mean binning method was used to solve the mean vs. median outliers problem. (for really big datasets that method has a non-trivial RAM cost though) r.cog is not listed on the wiki yet since the dip angle and azimuth + r.plane trend map is still a work in progress -- help welcome, I've got the 3D pivot point and azimuth seemingly ok, but the mean slope has proven tricky. My last attempt at it was to split the raster into three equal-area triangular zones (sort of looking like the South African flag) then do the COG for each (yes, a recursive script :) to get 3 3D points, then fit a plane formula to them. IIRC the trouble I was having with that was if there were large areas of NULL on one side it threw the whole thing off. see the example in the v.points.cog help page for an idea of what its original target application was: placing labels in just the right spot, while r.cog was written mainly with the idea to determine/create trend- surfaces. regards, Hamish ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] How to calculate mean coordinates from big point datasets?
Moritz: Would be interesting to see results for big data. And AFAIK median is a bit more difficult to do in awk. I imagine that replacing the median by the mean in numpy is no problem (might be a flag to add to v.median). see the old r.univar.sh shell script for an idea of how to get a median using unix power tools, it's fairly trivial. https://trac.osgeo.org/grass/browser/grass/branches/develbranch_6/scripts/r.univar.sh/r.univar.sh#L135 Hamish ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] How to calculate mean coordinates from big point datasets?
On 22/09/13 17:06, Markus Metz wrote: Markus Neteler wrote: Hi, I came across this question: http://gis.stackexchange.com/questions/71734/how-to-calculate-mean-coordinates-from-big-point-datasets Please try the new addon v.centerpoint [0]. It calculates various center points for point clouds, lines, and areas. Standard options are the geometric mean (center of gravity) and geometric median (more robust for outliers). There are additional options to calculate center points for points, lines and areas, explained in the manual. The geometric medians (points of minimum distance to all other points) are approximations, but fairly accurate and very fast. I would be surprised if any other GIS software can calculate these center points. Thanks a lot ! This is a very nice addition and should be part of core grass, IMHO. Moritz ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] How to calculate mean coordinates from big point datasets?
Markus Neteler wrote: Hi, I came across this question: http://gis.stackexchange.com/questions/71734/how-to-calculate-mean-coordinates-from-big-point-datasets Please try the new addon v.centerpoint [0]. It calculates various center points for point clouds, lines, and areas. Standard options are the geometric mean (center of gravity) and geometric median (more robust for outliers). There are additional options to calculate center points for points, lines and areas, explained in the manual. The geometric medians (points of minimum distance to all other points) are approximations, but fairly accurate and very fast. I would be surprised if any other GIS software can calculate these center points. Markus M [0] http://grasswiki.osgeo.org/wiki/AddOns/GRASS7/vector#v.centerpoint ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] How to calculate mean coordinates from big point datasets?
Glynn Clements wrote: Luca Delucchi wrote: maybe v.median [0] could help? Not for large datasets. First, it requires that the data will fit into RAM. Second, numpy.median() sorts the entire array and takes the middle value, which is somewhere between O(n.log(n)) for the typical case and O(n^2) for the worst case (numpy.median uses the default sorting algorithm, which is quicksort). See r.quantile for a more efficient approach for large datasets. The first pass computes a histogram which allows upper and lower bounds to be determined for each quantile. The second pass extracts values which lie within those bounds and sorts them. The approach of v.median and r.quantile regards each dimension separately which is IMHO not correct in a spatial context. The median of a point cloud would be a multivariate median for 2 or 3 dimensions. You would need to sort the points first by the first dimension, then by the second dimension etc, then pick the coordinates of the mid point. Alternatively you would need to find the point with the smallest distance to all other points, which is nightmare to calculate ( (n - 1) x (n - 1) distance calculations). Except for pathological cases (where the majority of the data lie within a tiny proportion of the overall range), only a small fraction of the data are sorted. Coordinates of large point clouds can easily be such pathological cases. In any case: is this question about the mean or the median? I guess the median could make sense to account for outliers. A v.centerpoint module could thus have options for point mode (mean, median, distance) and line mode (mid point, center of gravity). Area centroids are already calculated by v.category type=area, even though they are not centers of gravity. Markus M ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] How to calculate mean coordinates from big point datasets?
On Fri, Sep 20, 2013 at 5:38 PM, Markus Metz markus.metz.gisw...@gmail.com wrote: Glynn Clements wrote: Luca Delucchi wrote: maybe v.median [0] could help? Not for large datasets. First, it requires that the data will fit into RAM. Second, numpy.median() sorts the entire array and takes the middle value, which is somewhere between O(n.log(n)) for the typical case and O(n^2) for the worst case (numpy.median uses the default sorting algorithm, which is quicksort). See r.quantile for a more efficient approach for large datasets. The first pass computes a histogram which allows upper and lower bounds to be determined for each quantile. The second pass extracts values which lie within those bounds and sorts them. The approach of v.median and r.quantile regards each dimension separately which is IMHO not correct in a spatial context. This The median of a point cloud would be a multivariate median for 2 or 3 dimensions. You would need to sort the points first by the first dimension, then by the second dimension etc, then pick the coordinates of the mid point. is wrong, please ignore. This Alternatively you would need to find the point with the smallest distance to all other points, which is nightmare to calculate ( (n - 1) x (n - 1) distance calculations). is correct, but instead of finding the existing point with the smallest distance, that point can be approximated with less effort. Markus M ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] How to calculate mean coordinates from big point datasets?
On 18/09/13 16:24, Markus Metz wrote: On Wed, Sep 18, 2013 at 11:41 AM, Moritz Lennert mlenn...@club.worldonline.be wrote: On 18/09/13 10:51, Luca Delucchi wrote: On 17 September 2013 22:10, Markus Netelernete...@osgeo.org wrote: Hi, I came across this question: http://gis.stackexchange.com/questions/71734/how-to-calculate-mean-coordinates-from-big-point-datasets and wondered if this approach would be the fasted: # http://grass.osgeo.org/sampledata/north_carolina/points.las v.in.lidar input=points.las output=lidarpoints -o ... Number of points: 1287775 ... Now I would use v.univar -d lidarpoints type=point (still calculating here...) Is it the best way? maybe v.median [0] could help? [1] http://trac.osgeo.org/grass/browser/grass-addons/grass7/vector/v.median Right. Here's a little test: $time v.median in=elev_lid792_randpts 638648.50|220378.50 Should be 638648|220378. It seems that numpy gets the median wrong... When I look at the numbers coming out of v.to.db, there are a series of points at X=638648,5 around the 3000 limit, and a series of points at Y=220378,5, so I do believe that numpy is right. real0m0.249s user0m0.180s sys 0m0.044s $time v.to.db elev_lid792_randpts op=coor -p | awk -F'|' 'BEGIN{SUMX=0; SUMY=0; N=0} {N+=1;SUMX+=$2;SUMY+=$3} END{print SUMX/N, SUMY/N}' Reading features... 100% 638544 220339 Should be 638650 220376 Oops, forgot the header line, so N goes up to 6001, instead of 6000. real0m0.106s user0m0.100s sys 0m0.020s Would be interesting to see results for big data. And AFAIK median is a bit more difficult to do in awk. I imagine that replacing the median by the mean in numpy is no problem (might be a flag to add to v.median). The v.to.db + v.db.univar approach is working just fine, and provides correct results. Yes, but it seems a bit overkill to have to go through the attribute table to make such a calculation. About a little module to calculate centroid of polygon, center point of line and centroid (possibly weighted) for points, that would be easy because all the components are there, even though there are in theory alternatives to the current calculation of centroids for polygons. And these alternatives are better ? Mortiz ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] How to calculate mean coordinates from big point datasets?
On Thu, Sep 19, 2013 at 9:15 AM, Moritz Lennert mlenn...@club.worldonline.be wrote: On 18/09/13 16:24, Markus Metz wrote: Moritz Lennert wrote: Here's a little test: $time v.median in=elev_lid792_randpts 638648.50|220378.50 Should be 638648|220378. It seems that numpy gets the median wrong... When I look at the numbers coming out of v.to.db, there are a series of points at X=638648,5 around the 3000 limit, and a series of points at Y=220378,5, so I do believe that numpy is right. Right, db.univar was printing out with insufficient precision, fixed in r57750. About a little module to calculate centroid of polygon, center point of line and centroid (possibly weighted) for points, that would be easy because all the components are there, even though there are in theory alternatives to the current calculation of centroids for polygons. And these alternatives are better ? I am not sure. For areas without isles, there is a faster alternative. For areas with isles, the current approach is fast, but the centroids might be placed at somewhat unexpected locations (the next best location inside the area, outside its isles). Here, alternatives would be much slower. Markus M ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] How to calculate mean coordinates from big point datasets?
On Thu, Sep 19, 2013 at 2:32 PM, Markus Metz markus.metz.gisw...@gmail.com wrote: ... I am not sure. For areas without isles, there is a faster alternative. For areas with isles, the current approach is fast, but the centroids might be placed at somewhat unexpected locations (the next best location inside the area, outside its isles). Here, alternatives would be much slower. the original exercise was about LiDAR points only... perhaps that needs to be treated differently. markusN ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] How to calculate mean coordinates from big point datasets?
Luca Delucchi wrote: maybe v.median [0] could help? Not for large datasets. First, it requires that the data will fit into RAM. Second, numpy.median() sorts the entire array and takes the middle value, which is somewhere between O(n.log(n)) for the typical case and O(n^2) for the worst case (numpy.median uses the default sorting algorithm, which is quicksort). See r.quantile for a more efficient approach for large datasets. The first pass computes a histogram which allows upper and lower bounds to be determined for each quantile. The second pass extracts values which lie within those bounds and sorts them. Except for pathological cases (where the majority of the data lie within a tiny proportion of the overall range), only a small fraction of the data are sorted. In any case: is this question about the mean or the median? Calculating the mean is far simpler, and can easily be done in O(n) time and O(1) space. -- Glynn Clements gl...@gclements.plus.com ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] How to calculate mean coordinates from big point datasets?
On 18/09/13 00:10, Markus Neteler wrote: Hi, I came across this question: http://gis.stackexchange.com/questions/71734/how-to-calculate-mean-coordinates-from-big-point-datasets and wondered if this approach would be the fasted: # http://grass.osgeo.org/sampledata/north_carolina/points.las v.in.lidar input=points.las output=lidarpoints -o ... Number of points: 1287775 ... Now I would use v.univar -d lidarpoints type=point This calculates geometry distances, not the mean coordinate: $ v.univar -d myhosp type=point number of primitives: 160 number of non zero distances: 12561 number of zero distances: 0 minimum: 9.16773 maximum: 760776 range: 760767 sum: 2.69047e+09 mean: 214193 mean of absolute values: 214193 population standard deviation: 128505 population variance: 1.65136e+10 population coefficient of variation: 0.599953 sample standard deviation: 128511 sample variance: 1.6515e+10 kurtosis: 0.277564 skewness: 0.801646 Is it the best way? How about v.to.db -p op=coor and then calculating the mean of the coordinates with an ad-hoc script. But that's probably not any faster than Hamish' v.out.ascii approach in v.points.cog. Would probably be a nice little module to have in C: calculate centroid of polygon, center point of line and centroid (possibly weighted) for points. Moritz ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] How to calculate mean coordinates from big point datasets?
On 17 September 2013 22:10, Markus Neteler nete...@osgeo.org wrote: Hi, I came across this question: http://gis.stackexchange.com/questions/71734/how-to-calculate-mean-coordinates-from-big-point-datasets and wondered if this approach would be the fasted: # http://grass.osgeo.org/sampledata/north_carolina/points.las v.in.lidar input=points.las output=lidarpoints -o ... Number of points: 1287775 ... Now I would use v.univar -d lidarpoints type=point (still calculating here...) Is it the best way? maybe v.median [0] could help? Markus [1] http://trac.osgeo.org/grass/browser/grass-addons/grass7/vector/v.median -- ciao Luca http://gis.cri.fmach.it/delucchi/ www.lucadelu.org ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] How to calculate mean coordinates from big point datasets?
On 18/09/13 10:51, Luca Delucchi wrote: On 17 September 2013 22:10, Markus Netelernete...@osgeo.org wrote: Hi, I came across this question: http://gis.stackexchange.com/questions/71734/how-to-calculate-mean-coordinates-from-big-point-datasets and wondered if this approach would be the fasted: # http://grass.osgeo.org/sampledata/north_carolina/points.las v.in.lidar input=points.las output=lidarpoints -o ... Number of points: 1287775 ... Now I would use v.univar -d lidarpoints type=point (still calculating here...) Is it the best way? maybe v.median [0] could help? [1] http://trac.osgeo.org/grass/browser/grass-addons/grass7/vector/v.median Right. Here's a little test: $time v.median in=elev_lid792_randpts 638648.50|220378.50 real0m0.249s user0m0.180s sys 0m0.044s $time v.to.db elev_lid792_randpts op=coor -p | awk -F'|' 'BEGIN{SUMX=0; SUMY=0; N=0} {N+=1;SUMX+=$2;SUMY+=$3} END{print SUMX/N, SUMY/N}' Reading features... 100% 638544 220339 real0m0.106s user0m0.100s sys 0m0.020s Would be interesting to see results for big data. And AFAIK median is a bit more difficult to do in awk. I imagine that replacing the median by the mean in numpy is no problem (might be a flag to add to v.median). I didn't try v.points.cog as that actually creates a new vector map. Moritz ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] How to calculate mean coordinates from big point datasets?
On Wed, Sep 18, 2013 at 11:41 AM, Moritz Lennert mlenn...@club.worldonline.be wrote: On 18/09/13 10:51, Luca Delucchi wrote: On 17 September 2013 22:10, Markus Netelernete...@osgeo.org wrote: Hi, I came across this question: http://gis.stackexchange.com/questions/71734/how-to-calculate-mean-coordinates-from-big-point-datasets and wondered if this approach would be the fasted: # http://grass.osgeo.org/sampledata/north_carolina/points.las v.in.lidar input=points.las output=lidarpoints -o ... Number of points: 1287775 ... Now I would use v.univar -d lidarpoints type=point (still calculating here...) Is it the best way? maybe v.median [0] could help? [1] http://trac.osgeo.org/grass/browser/grass-addons/grass7/vector/v.median Right. Here's a little test: $time v.median in=elev_lid792_randpts 638648.50|220378.50 Should be 638648|220378. It seems that numpy gets the median wrong... real0m0.249s user0m0.180s sys 0m0.044s $time v.to.db elev_lid792_randpts op=coor -p | awk -F'|' 'BEGIN{SUMX=0; SUMY=0; N=0} {N+=1;SUMX+=$2;SUMY+=$3} END{print SUMX/N, SUMY/N}' Reading features... 100% 638544 220339 Should be 638650 220376 real0m0.106s user0m0.100s sys 0m0.020s Would be interesting to see results for big data. And AFAIK median is a bit more difficult to do in awk. I imagine that replacing the median by the mean in numpy is no problem (might be a flag to add to v.median). The v.to.db + v.db.univar approach is working just fine, and provides correct results. About a little module to calculate centroid of polygon, center point of line and centroid (possibly weighted) for points, that would be easy because all the components are there, even though there are in theory alternatives to the current calculation of centroids for polygons. Markus M ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
[GRASS-dev] How to calculate mean coordinates from big point datasets?
Hi, I came across this question: http://gis.stackexchange.com/questions/71734/how-to-calculate-mean-coordinates-from-big-point-datasets and wondered if this approach would be the fasted: # http://grass.osgeo.org/sampledata/north_carolina/points.las v.in.lidar input=points.las output=lidarpoints -o ... Number of points: 1287775 ... Now I would use v.univar -d lidarpoints type=point (still calculating here...) Is it the best way? Markus ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] How to calculate mean coordinates from big point datasets?
Markus Neteler wrote: I came across this question: http://gis.stackexchange.com/questions/71734/how-to-calculate-mean-coordinates-from-big-point-datasets so wants to find the average coordinate? and wondered if this approach would be the fasted: # http://grass.osgeo.org/sampledata/north_carolina/points.las v.in.lidar input=points.las output=lidarpoints -o ... Number of points: 1287775 ... Now I would use v.univar -d lidarpoints type=point (still calculating here...) Is it the best way? see also the v.points.cog addons script: http://grasswiki.osgeo.org/wiki/AddOns/GRASS_6#v.points.cog although I haven't tried it for anything as big as lidar data. Hamish ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev