Sure, I don't see why not.  Ultimately this is more or less the same thing
I proposed.   You end up with a slightly different way of encoding a point
in space into a rough geographical area.  Whether you encode them as a tree
structure or some prefix of a geohash is a matter of convenience.  I'm not
sure if there's any performance advantage to using geohashes, from a
Cassandra data model & query perspective, as I haven't spent much time with
them.  Maybe someone who's done this can chime in.

On Tue, May 9, 2017 at 1:16 PM Jim Ancona <j...@anconafamily.com> wrote:

> Couldn't you use a bucketing strategy for the hash value, much like with
> time series data? That is, choose a partition key granularity that puts a
> reasonable number of rows in a partition, with the actual hash being the
> clustering key. Then ranges that within the partition key granularity could
> be efficiently queried.
>
> Jim
>
> On Tue, May 9, 2017 at 11:19 AM, Jon Haddad <jonathan.had...@gmail.com>
> wrote:
>
>> The problem with using geohashes is that you can’t efficiently do ranges
>> with random token distribution.  So even if your scalar values are close to
>> each other numerically they’ll likely end up on different nodes, and you
>> end up doing a scatter gather.
>>
>> If the goal is to provide a scalable solution, building a table that
>> functions as an R-Tree or Quad Tree is the only way I know that can solve
>> the problem without scanning the entire cluster.
>>
>> Jon
>>
>> On May 9, 2017, at 10:11 AM, Jim Ancona <j...@anconafamily.com> wrote:
>>
>> There are clever ways to encode coordinates into a single scalar value
>> where points that are close on a surface are also close in value, making
>> queries efficient. Examples are Geohash
>> <https://en.wikipedia.org/wiki/Geohash> and Google's S2
>> <https://docs.google.com/presentation/d/1Hl4KapfAENAOf4gv-pSngKwvS_jwNVHRPZTTDzXXn6Q/view#slide=id.i0>.
>> As Jon mentions, this puts more work on the client, but might give you a
>> lot of querying flexibility when using Cassandra.
>>
>> Jim
>>
>> On Mon, May 8, 2017 at 11:13 PM, Jon Haddad <jonathan.had...@gmail.com>
>> wrote:
>>
>>> It gets a little tricky when you try to add in the coordinates to the
>>> clustering key if you want to do operations that are more complex.  For
>>> instance, finding all the elements within a radius of point (x,y) isn’t
>>> particularly fun with Cassandra.  I recommend moving that logic into the
>>> application.
>>>
>>> > On May 8, 2017, at 10:06 PM, kurt greaves <k...@instaclustr.com>
>>> wrote:
>>> >
>>> > Note that will not give you the desired range queries of 0 >= x <= 1
>>> and 0 >= y <= 1.
>>> >
>>> >
>>> > ​Something akin to Jon's solution could give you those range queries
>>> if you made the x and y components part of the clustering key.
>>> >
>>> > For example, a space of (1,1) could contain all x,y coordinates where
>>> x and y are > 0 and <= 1. You would then have a table like:
>>> >
>>> > CREATE TABLE geospatial (
>>> > space text,
>>> > x double,
>>> > y double,
>>> > item text,
>>> > m1,
>>> > m2,
>>> > m3,
>>> > primary key ((space), x, y, m1, m2, m3, m4, m5)
>>> > );
>>> >
>>> > A query of select * where space = '1,1' and x <1 and x >0.5 and y< 0.2
>>> and y>0.1; should yield all x and y pairs and their distinct metadata. Or
>>> something like that anyway.
>>> >
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>>
>>>
>>
>>
>

Reply via email to