Couldn't you use a bucketing strategy for the hash value, much like with
time series data? That is, choose a partition key granularity that puts a
reasonable number of rows in a partition, with the actual hash being the
clustering key. Then ranges that within the partition key granularity could
be efficiently queried.

Jim

On Tue, May 9, 2017 at 11:19 AM, Jon Haddad <jonathan.had...@gmail.com>
wrote:

> The problem with using geohashes is that you can’t efficiently do ranges
> with random token distribution.  So even if your scalar values are close to
> each other numerically they’ll likely end up on different nodes, and you
> end up doing a scatter gather.
>
> If the goal is to provide a scalable solution, building a table that
> functions as an R-Tree or Quad Tree is the only way I know that can solve
> the problem without scanning the entire cluster.
>
> Jon
>
> On May 9, 2017, at 10:11 AM, Jim Ancona <j...@anconafamily.com> wrote:
>
> There are clever ways to encode coordinates into a single scalar value
> where points that are close on a surface are also close in value, making
> queries efficient. Examples are Geohash
> <https://en.wikipedia.org/wiki/Geohash> and Google's S2
> <https://docs.google.com/presentation/d/1Hl4KapfAENAOf4gv-pSngKwvS_jwNVHRPZTTDzXXn6Q/view#slide=id.i0>.
> As Jon mentions, this puts more work on the client, but might give you a
> lot of querying flexibility when using Cassandra.
>
> Jim
>
> On Mon, May 8, 2017 at 11:13 PM, Jon Haddad <jonathan.had...@gmail.com>
> wrote:
>
>> It gets a little tricky when you try to add in the coordinates to the
>> clustering key if you want to do operations that are more complex.  For
>> instance, finding all the elements within a radius of point (x,y) isn’t
>> particularly fun with Cassandra.  I recommend moving that logic into the
>> application.
>>
>> > On May 8, 2017, at 10:06 PM, kurt greaves <k...@instaclustr.com> wrote:
>> >
>> > Note that will not give you the desired range queries of 0 >= x <= 1
>> and 0 >= y <= 1.
>> >
>> >
>> > ​Something akin to Jon's solution could give you those range queries if
>> you made the x and y components part of the clustering key.
>> >
>> > For example, a space of (1,1) could contain all x,y coordinates where x
>> and y are > 0 and <= 1. You would then have a table like:
>> >
>> > CREATE TABLE geospatial (
>> > space text,
>> > x double,
>> > y double,
>> > item text,
>> > m1,
>> > m2,
>> > m3,
>> > primary key ((space), x, y, m1, m2, m3, m4, m5)
>> > );
>> >
>> > A query of select * where space = '1,1' and x <1 and x >0.5 and y< 0.2
>> and y>0.1; should yield all x and y pairs and their distinct metadata. Or
>> something like that anyway.
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>
>

Reply via email to