Couldn't you use a bucketing strategy for the hash value, much like with time series data? That is, choose a partition key granularity that puts a reasonable number of rows in a partition, with the actual hash being the clustering key. Then ranges that within the partition key granularity could be efficiently queried.
Jim On Tue, May 9, 2017 at 11:19 AM, Jon Haddad <jonathan.had...@gmail.com> wrote: > The problem with using geohashes is that you can’t efficiently do ranges > with random token distribution. So even if your scalar values are close to > each other numerically they’ll likely end up on different nodes, and you > end up doing a scatter gather. > > If the goal is to provide a scalable solution, building a table that > functions as an R-Tree or Quad Tree is the only way I know that can solve > the problem without scanning the entire cluster. > > Jon > > On May 9, 2017, at 10:11 AM, Jim Ancona <j...@anconafamily.com> wrote: > > There are clever ways to encode coordinates into a single scalar value > where points that are close on a surface are also close in value, making > queries efficient. Examples are Geohash > <https://en.wikipedia.org/wiki/Geohash> and Google's S2 > <https://docs.google.com/presentation/d/1Hl4KapfAENAOf4gv-pSngKwvS_jwNVHRPZTTDzXXn6Q/view#slide=id.i0>. > As Jon mentions, this puts more work on the client, but might give you a > lot of querying flexibility when using Cassandra. > > Jim > > On Mon, May 8, 2017 at 11:13 PM, Jon Haddad <jonathan.had...@gmail.com> > wrote: > >> It gets a little tricky when you try to add in the coordinates to the >> clustering key if you want to do operations that are more complex. For >> instance, finding all the elements within a radius of point (x,y) isn’t >> particularly fun with Cassandra. I recommend moving that logic into the >> application. >> >> > On May 8, 2017, at 10:06 PM, kurt greaves <k...@instaclustr.com> wrote: >> > >> > Note that will not give you the desired range queries of 0 >= x <= 1 >> and 0 >= y <= 1. >> > >> > >> > Something akin to Jon's solution could give you those range queries if >> you made the x and y components part of the clustering key. >> > >> > For example, a space of (1,1) could contain all x,y coordinates where x >> and y are > 0 and <= 1. You would then have a table like: >> > >> > CREATE TABLE geospatial ( >> > space text, >> > x double, >> > y double, >> > item text, >> > m1, >> > m2, >> > m3, >> > primary key ((space), x, y, m1, m2, m3, m4, m5) >> > ); >> > >> > A query of select * where space = '1,1' and x <1 and x >0.5 and y< 0.2 >> and y>0.1; should yield all x and y pairs and their distinct metadata. Or >> something like that anyway. >> > >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: user-h...@cassandra.apache.org >> >> > >