As several people have commented, a single range for a query can produce a lot of false positives that need to be filtered out. I had made this visualization a while back that lets you interactively (click-drag a bounding box) see this behavior.
http://bl.ocks.org/jaredwinick/5073432 On Sun, Jul 27, 2014 at 2:15 PM, Anthony Fox <[email protected]> wrote: > My first thought was just something simple for a first pass - lat/lon -> a > single lexicographic dimension - as it would cover most basic use cases. > Precision (number of bits in encoding) could be an arg or a config > variable. For WITHIN/INTERSECTS topological predicates, we need to > decompose the query geometry into the (possibly/probably non-contiguous) 1D > ranges that cover the region in question. GeoMesa has an algorithm to > quickly perform this decomposition that computes the minimum number of > geohash prefixes at different resolutions to fully cover the query > polygon. Basically, it recursively traverses through levels of geohash > resolutions, prioritizing rectangles that intersect the region and > discarding non-intersecting rectangles at the lowest precisions, to produce > a sequence of ranges to scan over. Fully contained rectangles are > discovered at their lowest resolution at which point the algorithm pops the > stack and searches the next prioritized prefix. I think something like > this would definitely need to be ported over and included in a lexicoder > implementation to make it useful. Also, rather than materialize the entire > set of ranges in memory, we can either return a lazy iterator of prefixes > that can be fed into a scanner in batches or we can have a short-circuit > config that tunes the amount of slop that's tolerable and cuts off the > traversal at a certain level of precision. GeoMesa uses something like the > former on attribute indexes to coordinate parallel scanners on separate > index and record tables. > > Thoughts? I'm inclined to keep the implementation to the bare minimum > necessary for the basic use cases (lat/lon and bbox queries) though I do > think a general dimensionality reducing lexicoder would be very useful. > > > > On Fri, Jul 25, 2014 at 6:51 PM, Chris Bennight <[email protected]> wrote: > >> A couple of related issues come up when considering implementing a >> dimensionality reducing encoding -- just want to toss those out to see what >> people think the interface might look like. >> >> There's a couple of aspects that could be brought in here, but lets keep >> it simple and considering the original question: (lat/lon) -> number. >> >> --Desired precision of the binning process >> The more bits we add to the z-curve, the more precise our comparison - >> i.e. a 63 bit key would have more "locations" to sort by than a 24 bit key. >> >> Would you see a reasonable default getting picked, make this user >> configurable, or both? (i.e. default to a value, extended options with a >> new constructor?) >> >> --Semantics for turning two lat/long pairs into a range >> I'm extrapolating here, but the only reason I see that locality matters >> is if we want to preserve locality for range searches. The internal >> implementation of the encoding/lexicoding process is going to directly >> impact the implementation of the range query. >> Now sure, someone could encode the lower left point, encode the upper >> right point, and construct a range out of that to pass for a scan, but >> that's going to be wildly inefficient in most cases. See: >> https://dl.dropboxusercontent.com/u/6649380/bbox.png >> If we just lexicode the lower left and upper right we traverse across the >> entire curve - hitting lots of areas that aren't actually in the original >> range. >> Now we can turn a single 2D range into a set of 1D ranges. There is >> some potential tuning here now, as the algorithm has a tradeoff on time to >> compute the ranges (and number of ranges) vs. "slop" (or inclusion of >> ranges which aren't actually in the original query). >> Would you see a static method perhaps on the z-curve lexicoder that >> returns a series of ranges based on an input window? Some other mechanism? >> And in the case of "slop" - would we just document that the ranges could >> actually include values not expected - or would we always fully decompose? >> >> ------------------ >> >> The other, "making it more complicated" questions would probably resolve >> around generalizing this to a multi-dimensional lexicoder. I would expect >> the WGS84 (lat/long) encoder would just be a 2D instance with -180/+180 >> -90/+90 bounds. There are probably completely different cases where it >> would be useful to have a locality sensitive hash. >> But with the big caveat that this requires more methods - I now need a >> way of defining a range for each of the dimensions (which before were >> defined as lat/lon), and I need an interface/class to pass those dimensions >> to the encoder - and should be able to use those same definitions to decode >> my values on the way out. >> >> I'm not trying to make a mountain out of a molehill - one could >> definitely pick some sane defaults, document behaviour, and put in a WGS84 >> specific implementation. I'm just thinking what's the minimum viable bit >> that's needed for this to be useful - and I suspect it's also the range >> decomposition piece - as I suspect *everyone* would really need that (if >> they cared about locality in the first place). >> >> My main question here would be to what extent would you see this going? >> A one off for WGS84, or a more generic multi-dimensional lexicoder; and >> in either case, what would the thoughts/consensus be for implementation of >> the additional methods (and possibly classes) required? >> >> >> >> >> >> On Fri, Jul 25, 2014 at 12:37 PM, Sean Busbey <[email protected]> >> wrote: >> >>> >>> On Fri, Jul 25, 2014 at 8:45 AM, Anthony F <[email protected]> wrote: >>> >>>> >>>> GeoMesa plans to update to 1.6 as soon as we get past a 1.0 RELEASE >>>> (which is waiting on approval by the Eclipse Foundation's IP review process >>>> and should happen in a month or so). I think we could break out a z-curve >>>> lexicoder and contribute before then though. I'll poke around at the >>>> lexicoder stuff in 1.6 and see what it would take to use that interface >>>> now. >>>> >>>> >>> >>> That sounds great Anthony. Let us know if we can help with anything! >>> >>> >>> >>> -- >>> Sean >>> >> >> >
