Sure, I don't see why not. Ultimately this is more or less the same thing I proposed. You end up with a slightly different way of encoding a point in space into a rough geographical area. Whether you encode them as a tree structure or some prefix of a geohash is a matter of convenience. I'm not sure if there's any performance advantage to using geohashes, from a Cassandra data model & query perspective, as I haven't spent much time with them. Maybe someone who's done this can chime in.
On Tue, May 9, 2017 at 1:16 PM Jim Ancona <j...@anconafamily.com> wrote: > Couldn't you use a bucketing strategy for the hash value, much like with > time series data? That is, choose a partition key granularity that puts a > reasonable number of rows in a partition, with the actual hash being the > clustering key. Then ranges that within the partition key granularity could > be efficiently queried. > > Jim > > On Tue, May 9, 2017 at 11:19 AM, Jon Haddad <jonathan.had...@gmail.com> > wrote: > >> The problem with using geohashes is that you can’t efficiently do ranges >> with random token distribution. So even if your scalar values are close to >> each other numerically they’ll likely end up on different nodes, and you >> end up doing a scatter gather. >> >> If the goal is to provide a scalable solution, building a table that >> functions as an R-Tree or Quad Tree is the only way I know that can solve >> the problem without scanning the entire cluster. >> >> Jon >> >> On May 9, 2017, at 10:11 AM, Jim Ancona <j...@anconafamily.com> wrote: >> >> There are clever ways to encode coordinates into a single scalar value >> where points that are close on a surface are also close in value, making >> queries efficient. Examples are Geohash >> <https://en.wikipedia.org/wiki/Geohash> and Google's S2 >> <https://docs.google.com/presentation/d/1Hl4KapfAENAOf4gv-pSngKwvS_jwNVHRPZTTDzXXn6Q/view#slide=id.i0>. >> As Jon mentions, this puts more work on the client, but might give you a >> lot of querying flexibility when using Cassandra. >> >> Jim >> >> On Mon, May 8, 2017 at 11:13 PM, Jon Haddad <jonathan.had...@gmail.com> >> wrote: >> >>> It gets a little tricky when you try to add in the coordinates to the >>> clustering key if you want to do operations that are more complex. For >>> instance, finding all the elements within a radius of point (x,y) isn’t >>> particularly fun with Cassandra. I recommend moving that logic into the >>> application. >>> >>> > On May 8, 2017, at 10:06 PM, kurt greaves <k...@instaclustr.com> >>> wrote: >>> > >>> > Note that will not give you the desired range queries of 0 >= x <= 1 >>> and 0 >= y <= 1. >>> > >>> > >>> > Something akin to Jon's solution could give you those range queries >>> if you made the x and y components part of the clustering key. >>> > >>> > For example, a space of (1,1) could contain all x,y coordinates where >>> x and y are > 0 and <= 1. You would then have a table like: >>> > >>> > CREATE TABLE geospatial ( >>> > space text, >>> > x double, >>> > y double, >>> > item text, >>> > m1, >>> > m2, >>> > m3, >>> > primary key ((space), x, y, m1, m2, m3, m4, m5) >>> > ); >>> > >>> > A query of select * where space = '1,1' and x <1 and x >0.5 and y< 0.2 >>> and y>0.1; should yield all x and y pairs and their distinct metadata. Or >>> something like that anyway. >>> > >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >>> For additional commands, e-mail: user-h...@cassandra.apache.org >>> >>> >> >> >