[email protected] wrote on 07/02/2010 12:26:43 AM:
> GeoHash doesn't cut it? It's space partitioned, just not as balanced > as a kd-tree, but for points... no? I'll look up my info on the GiST > work that was done, but note that it didn't make it into 9.0, but into > 9.1, and as I recall Oleg and co did not seem to think the kd > implementation was going to be very effective. At this point I'm just considering solutions on the conceptual level to identify promising candidates. GeoHash is promising. However, my issue with GeoHash is that it linearizes a 2d problem, much like space-filling curves. (i.e., you never know how far away a "close" geohash-value is; conversely: identifying all the "near" geohashes presents an issue). Besides: GeoHash is global and our study is regional. 2/3 or more of the hash values will be empty, while the others will be chock full of points. Also, our region extends from 20-80 degrees N. GeoHashes aren't a constant-distance metric, they're constant-angle. We're going to have fat cells toward the south and skinny ones in the north. The most conceptually pleasing approach for this proximity-based function would be to operate on the kd tree itself, minimizing the number of searches. At most, a "proximity" search from a known point in the tree may have to pop up a couple of levels (to check the neighbors), and the _same_ proximity search could be re-used for all nodes under the same subtree. Directly navigating a spatial index which retains the spatial identity of the indexed points seems like the most efficient way to do this. I guess in database terms, I'm talking about implementing a new "cursor" which leverages a hierchical structure like the kd-tree (to walk thru the points using the proximity information in the tree). Simultaneously: implementing a caching proximity search on the tree itself which can re-use previous searches from the same subtree. That way, proximity searches from points under the same subtree needn't be repeated. This is perhaps not a topic for the users list. If I flesh the idea out more, I'll use the wiki. We've been granted a reprieve for the moment by selecting a "case study" which needs only 3 million out of 90 million points. However, we're eventually going to have to suck it up and process everything. So we've got a bit of time to mull it over; and/or review the literature for smart ways to use a kd-tree/other scheme. Bryce
_______________________________________________ postgis-users mailing list [email protected] http://postgis.refractions.net/mailman/listinfo/postgis-users
