Re: Smart Table creation for 2D range query

2017-05-09 Thread Jon Haddad
Sure, I don't see why not.  Ultimately this is more or less the same thing
I proposed.   You end up with a slightly different way of encoding a point
in space into a rough geographical area.  Whether you encode them as a tree
structure or some prefix of a geohash is a matter of convenience.  I'm not
sure if there's any performance advantage to using geohashes, from a
Cassandra data model & query perspective, as I haven't spent much time with
them.  Maybe someone who's done this can chime in.

On Tue, May 9, 2017 at 1:16 PM Jim Ancona  wrote:

> Couldn't you use a bucketing strategy for the hash value, much like with
> time series data? That is, choose a partition key granularity that puts a
> reasonable number of rows in a partition, with the actual hash being the
> clustering key. Then ranges that within the partition key granularity could
> be efficiently queried.
>
> Jim
>
> On Tue, May 9, 2017 at 11:19 AM, Jon Haddad 
> wrote:
>
>> The problem with using geohashes is that you can’t efficiently do ranges
>> with random token distribution.  So even if your scalar values are close to
>> each other numerically they’ll likely end up on different nodes, and you
>> end up doing a scatter gather.
>>
>> If the goal is to provide a scalable solution, building a table that
>> functions as an R-Tree or Quad Tree is the only way I know that can solve
>> the problem without scanning the entire cluster.
>>
>> Jon
>>
>> On May 9, 2017, at 10:11 AM, Jim Ancona  wrote:
>>
>> There are clever ways to encode coordinates into a single scalar value
>> where points that are close on a surface are also close in value, making
>> queries efficient. Examples are Geohash
>>  and Google's S2
>> .
>> As Jon mentions, this puts more work on the client, but might give you a
>> lot of querying flexibility when using Cassandra.
>>
>> Jim
>>
>> On Mon, May 8, 2017 at 11:13 PM, Jon Haddad 
>> wrote:
>>
>>> It gets a little tricky when you try to add in the coordinates to the
>>> clustering key if you want to do operations that are more complex.  For
>>> instance, finding all the elements within a radius of point (x,y) isn’t
>>> particularly fun with Cassandra.  I recommend moving that logic into the
>>> application.
>>>
>>> > On May 8, 2017, at 10:06 PM, kurt greaves 
>>> wrote:
>>> >
>>> > Note that will not give you the desired range queries of 0 >= x <= 1
>>> and 0 >= y <= 1.
>>> >
>>> >
>>> > ​Something akin to Jon's solution could give you those range queries
>>> if you made the x and y components part of the clustering key.
>>> >
>>> > For example, a space of (1,1) could contain all x,y coordinates where
>>> x and y are > 0 and <= 1. You would then have a table like:
>>> >
>>> > CREATE TABLE geospatial (
>>> > space text,
>>> > x double,
>>> > y double,
>>> > item text,
>>> > m1,
>>> > m2,
>>> > m3,
>>> > primary key ((space), x, y, m1, m2, m3, m4, m5)
>>> > );
>>> >
>>> > A query of select * where space = '1,1' and x <1 and x >0.5 and y< 0.2
>>> and y>0.1; should yield all x and y pairs and their distinct metadata. Or
>>> something like that anyway.
>>> >
>>>
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>>
>>>
>>
>>
>


Re: Smart Table creation for 2D range query

2017-05-09 Thread Jim Ancona
Couldn't you use a bucketing strategy for the hash value, much like with
time series data? That is, choose a partition key granularity that puts a
reasonable number of rows in a partition, with the actual hash being the
clustering key. Then ranges that within the partition key granularity could
be efficiently queried.

Jim

On Tue, May 9, 2017 at 11:19 AM, Jon Haddad 
wrote:

> The problem with using geohashes is that you can’t efficiently do ranges
> with random token distribution.  So even if your scalar values are close to
> each other numerically they’ll likely end up on different nodes, and you
> end up doing a scatter gather.
>
> If the goal is to provide a scalable solution, building a table that
> functions as an R-Tree or Quad Tree is the only way I know that can solve
> the problem without scanning the entire cluster.
>
> Jon
>
> On May 9, 2017, at 10:11 AM, Jim Ancona  wrote:
>
> There are clever ways to encode coordinates into a single scalar value
> where points that are close on a surface are also close in value, making
> queries efficient. Examples are Geohash
>  and Google's S2
> .
> As Jon mentions, this puts more work on the client, but might give you a
> lot of querying flexibility when using Cassandra.
>
> Jim
>
> On Mon, May 8, 2017 at 11:13 PM, Jon Haddad 
> wrote:
>
>> It gets a little tricky when you try to add in the coordinates to the
>> clustering key if you want to do operations that are more complex.  For
>> instance, finding all the elements within a radius of point (x,y) isn’t
>> particularly fun with Cassandra.  I recommend moving that logic into the
>> application.
>>
>> > On May 8, 2017, at 10:06 PM, kurt greaves  wrote:
>> >
>> > Note that will not give you the desired range queries of 0 >= x <= 1
>> and 0 >= y <= 1.
>> >
>> >
>> > ​Something akin to Jon's solution could give you those range queries if
>> you made the x and y components part of the clustering key.
>> >
>> > For example, a space of (1,1) could contain all x,y coordinates where x
>> and y are > 0 and <= 1. You would then have a table like:
>> >
>> > CREATE TABLE geospatial (
>> > space text,
>> > x double,
>> > y double,
>> > item text,
>> > m1,
>> > m2,
>> > m3,
>> > primary key ((space), x, y, m1, m2, m3, m4, m5)
>> > );
>> >
>> > A query of select * where space = '1,1' and x <1 and x >0.5 and y< 0.2
>> and y>0.1; should yield all x and y pairs and their distinct metadata. Or
>> something like that anyway.
>> >
>>
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>
>


Re: Smart Table creation for 2D range query

2017-05-09 Thread Jon Haddad
The problem with using geohashes is that you can’t efficiently do ranges with 
random token distribution.  So even if your scalar values are close to each 
other numerically they’ll likely end up on different nodes, and you end up 
doing a scatter gather.

If the goal is to provide a scalable solution, building a table that functions 
as an R-Tree or Quad Tree is the only way I know that can solve the problem 
without scanning the entire cluster.

Jon

> On May 9, 2017, at 10:11 AM, Jim Ancona  wrote:
> 
> There are clever ways to encode coordinates into a single scalar value where 
> points that are close on a surface are also close in value, making queries 
> efficient. Examples are Geohash  and 
> Google's S2 
> .
>  As Jon mentions, this puts more work on the client, but might give you a lot 
> of querying flexibility when using Cassandra.
> 
> Jim
> 
> On Mon, May 8, 2017 at 11:13 PM, Jon Haddad  > wrote:
> It gets a little tricky when you try to add in the coordinates to the 
> clustering key if you want to do operations that are more complex.  For 
> instance, finding all the elements within a radius of point (x,y) isn’t 
> particularly fun with Cassandra.  I recommend moving that logic into the 
> application.
> 
> > On May 8, 2017, at 10:06 PM, kurt greaves  > > wrote:
> >
> > Note that will not give you the desired range queries of 0 >= x <= 1 and 0 
> > >= y <= 1.
> >
> >
> > ​Something akin to Jon's solution could give you those range queries if you 
> > made the x and y components part of the clustering key.
> >
> > For example, a space of (1,1) could contain all x,y coordinates where x and 
> > y are > 0 and <= 1. You would then have a table like:
> >
> > CREATE TABLE geospatial (
> > space text,
> > x double,
> > y double,
> > item text,
> > m1,
> > m2,
> > m3,
> > primary key ((space), x, y, m1, m2, m3, m4, m5)
> > );
> >
> > A query of select * where space = '1,1' and x <1 and x >0.5 and y< 0.2 and 
> > y>0.1; should yield all x and y pairs and their distinct metadata. Or 
> > something like that anyway.
> >
> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
> 
> For additional commands, e-mail: user-h...@cassandra.apache.org 
> 
> 
> 



Re: Smart Table creation for 2D range query

2017-05-09 Thread Jim Ancona
There are clever ways to encode coordinates into a single scalar value
where points that are close on a surface are also close in value, making
queries efficient. Examples are Geohash
 and Google's S2
.
As Jon mentions, this puts more work on the client, but might give you a
lot of querying flexibility when using Cassandra.

Jim

On Mon, May 8, 2017 at 11:13 PM, Jon Haddad 
wrote:

> It gets a little tricky when you try to add in the coordinates to the
> clustering key if you want to do operations that are more complex.  For
> instance, finding all the elements within a radius of point (x,y) isn’t
> particularly fun with Cassandra.  I recommend moving that logic into the
> application.
>
> > On May 8, 2017, at 10:06 PM, kurt greaves  wrote:
> >
> > Note that will not give you the desired range queries of 0 >= x <= 1 and
> 0 >= y <= 1.
> >
> >
> > ​Something akin to Jon's solution could give you those range queries if
> you made the x and y components part of the clustering key.
> >
> > For example, a space of (1,1) could contain all x,y coordinates where x
> and y are > 0 and <= 1. You would then have a table like:
> >
> > CREATE TABLE geospatial (
> > space text,
> > x double,
> > y double,
> > item text,
> > m1,
> > m2,
> > m3,
> > primary key ((space), x, y, m1, m2, m3, m4, m5)
> > );
> >
> > A query of select * where space = '1,1' and x <1 and x >0.5 and y< 0.2
> and y>0.1; should yield all x and y pairs and their distinct metadata. Or
> something like that anyway.
> >
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: Smart Table creation for 2D range query

2017-05-08 Thread Jon Haddad
It gets a little tricky when you try to add in the coordinates to the 
clustering key if you want to do operations that are more complex.  For 
instance, finding all the elements within a radius of point (x,y) isn’t 
particularly fun with Cassandra.  I recommend moving that logic into the 
application.  

> On May 8, 2017, at 10:06 PM, kurt greaves  wrote:
> 
> Note that will not give you the desired range queries of 0 >= x <= 1 and 0 >= 
> y <= 1.
> 
> 
> ​Something akin to Jon's solution could give you those range queries if you 
> made the x and y components part of the clustering key.
> 
> For example, a space of (1,1) could contain all x,y coordinates where x and y 
> are > 0 and <= 1. You would then have a table like:
> 
> CREATE TABLE geospatial (
> space text,
> x double,
> y double,
> item text,
> m1,
> m2,
> m3,
> primary key ((space), x, y, m1, m2, m3, m4, m5)
> );
> 
> A query of select * where space = '1,1' and x <1 and x >0.5 and y< 0.2 and 
> y>0.1; should yield all x and y pairs and their distinct metadata. Or 
> something like that anyway.
> 


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Smart Table creation for 2D range query

2017-05-08 Thread kurt greaves
Note that will not give you the desired range queries of 0 >= x <= 1 and 0
>= y <= 1.


​Something akin to Jon's solution could give you those range queries if you
made the x and y components part of the clustering key.

For example, a space of (1,1) could contain all x,y coordinates where x and
y are > 0 and <= 1. You would then have a table like:

CREATE TABLE geospatial (
space text,
x double,
y double,
item text,
m1,
m2,
m3,
primary key ((space), x, y, m1, m2, m3, m4, m5)
);

A query of select * where space = '1,1' and x <1 and x >0.5 and y< 0.2 and
y>0.1; should yield all x and y pairs and their distinct metadata. Or
something like that anyway.


Re: Smart Table creation for 2D range query

2017-05-08 Thread Anthony Grasso
Hi Lydia,

Yes. This will define the *x*, *y* columns as the components of the
partition key. Note that by doing this both *x* and *y* values will be
required to at a minimum to perform a valid query.

Alternatively, the *x* and *y* values could be combined in into a single
text field as Jon has suggested.

Kind regards,
Anthony

On 7 May 2017 at 17:15, Lydia Ickler  wrote:

> Like this?
>
> CREATE TABLE test (
>   x double,
>   y double,
>   m1 int,
>   ...
>   m5 int,
>   PRIMARY KEY ((x,y), m1, … , m5)
> )
>
>
>
> Am 05.05.2017 um 21:54 schrieb Nitan Kainth :
>
> Make metadata as partition key and x,y as part of partition key i.e.
> Primary key. It should work
>
> Sent from my iPhone
>
> On May 5, 2017, at 2:40 PM, Lydia  wrote:
>
>
> Hi all,
>
>
> I am new to Apache Cassandra and I would like to get some advice on how to
> tackle a table creation / indexing in a sophisticated way.
>
>
> My aim is to store x- and y-coordinates, accompanied by some columns with
> meta information (m1, ... ,m5). There will be around 100,000,000 rows
> overall. Some rows might have the same (x,y) pairs but always distinct meta
> information.
>
>
> In the end I want to do a rather simple range query in the form of e.g. (0
> >= x <= 1) AND (0 >= y <= 1).
>
>
> What would be the best choice of variables to set as primary key,
> partition key. Or should I use a index? And if so on what column(s)?
>
>
> Thanks in advance!
>
> Best regards,
>
> Lydia
>
> -
>
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>
>


Re: Smart Table creation for 2D range query

2017-05-07 Thread Lydia Ickler
Like this?

CREATE TABLE test ( x double, y double, m1 int, ... m5 int, PRIMARY KEY ((x,y), 
m1, … , m5) )


> Am 05.05.2017 um 21:54 schrieb Nitan Kainth :
> 
> Make metadata as partition key and x,y as part of partition key i.e. Primary 
> key. It should work
> 
> Sent from my iPhone
> 
>> On May 5, 2017, at 2:40 PM, Lydia  wrote:
>> 
>> Hi all,
>> 
>> I am new to Apache Cassandra and I would like to get some advice on how to 
>> tackle a table creation / indexing in a sophisticated way.
>> 
>> My aim is to store x- and y-coordinates, accompanied by some columns with 
>> meta information (m1, ... ,m5). There will be around 100,000,000 rows 
>> overall. Some rows might have the same (x,y) pairs but always distinct meta 
>> information. 
>> 
>> In the end I want to do a rather simple range query in the form of e.g. (0 
>> >= x <= 1) AND (0 >= y <= 1).
>> 
>> What would be the best choice of variables to set as primary key, partition 
>> key. Or should I use a index? And if so on what column(s)?
>> 
>> Thanks in advance!
>> Best regards, 
>> Lydia
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>> 


Re: Smart Table creation for 2D range query

2017-05-05 Thread Jon Haddad
I think you’ll want to model your table similar to how an R-Tree [1] / Quad 
tree [2] works.  Let’s suppose you had a 10x10 meter land area and you wanted 
to put stuff in there.  In order to find “all the things in point x,y”, you 
could break your land area into a grid.  A partition would contain all the 
items that are in that grid space.  In my simple example, I’d have 100 
partitions.

For example:

// space is a simple "x.y" text field
CREATE TABLE geospatial (
space text,
item text,
primary key (space, item)
);

insert into geospatial (space, item) values ('1.1', 'hat');
insert into geospatial (space, item) values ('1.1', 'bird');
insert into geospatial (space, item) values ('6.4', 'dog’);

This example is pretty trivial, and doesn’t take into account hot partitions.  
That’s where the process of subdividing a space occurs when it reaches a 
certain size.

[1] https://en.wikipedia.org/wiki/R-tree 
[2] https://en.wikipedia.org/wiki/Quadtree 

> On May 5, 2017, at 12:54 PM, Nitan Kainth  wrote:
> 
> Make metadata as partition key and x,y as part of partition key i.e. Primary 
> key. It should work
> 
> Sent from my iPhone
> 
>> On May 5, 2017, at 2:40 PM, Lydia  wrote:
>> 
>> Hi all,
>> 
>> I am new to Apache Cassandra and I would like to get some advice on how to 
>> tackle a table creation / indexing in a sophisticated way.
>> 
>> My aim is to store x- and y-coordinates, accompanied by some columns with 
>> meta information (m1, ... ,m5). There will be around 100,000,000 rows 
>> overall. Some rows might have the same (x,y) pairs but always distinct meta 
>> information. 
>> 
>> In the end I want to do a rather simple range query in the form of e.g. (0 
>> >= x <= 1) AND (0 >= y <= 1).
>> 
>> What would be the best choice of variables to set as primary key, partition 
>> key. Or should I use a index? And if so on what column(s)?
>> 
>> Thanks in advance!
>> Best regards, 
>> Lydia
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 



Re: Smart Table creation for 2D range query

2017-05-05 Thread Nitan Kainth
Make metadata as partition key and x,y as part of partition key i.e. Primary 
key. It should work

Sent from my iPhone

> On May 5, 2017, at 2:40 PM, Lydia  wrote:
> 
> Hi all,
> 
> I am new to Apache Cassandra and I would like to get some advice on how to 
> tackle a table creation / indexing in a sophisticated way.
> 
> My aim is to store x- and y-coordinates, accompanied by some columns with 
> meta information (m1, ... ,m5). There will be around 100,000,000 rows 
> overall. Some rows might have the same (x,y) pairs but always distinct meta 
> information. 
> 
> In the end I want to do a rather simple range query in the form of e.g. (0 >= 
> x <= 1) AND (0 >= y <= 1).
> 
> What would be the best choice of variables to set as primary key, partition 
> key. Or should I use a index? And if so on what column(s)?
> 
> Thanks in advance!
> Best regards, 
> Lydia
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org