Re: 2d or multi dimension range query in cassandra CQL
i am using Startio Cassandra it way better than stargate as it works on the latest release of Cassandra and is better on my performance. we are using it for fulltext search use case Regards Asit On Sun, Mar 22, 2015 at 12:14 PM, Mehak Mehta meme...@cs.stonybrook.edu wrote: Hi, On the basis of some suggestions, I tried using tuplejump for multidimensional queries. Since other mostly needed root permissions (for building ) which I don't have on my cluster account. I found a major problem in tuplejump (stargate-core). When I am using it with a list type field in my table. It stops working. For e.g. create table person ( id int primary key, isActive boolean, age int, eyeColor varchar, name text, gender varchar, company varchar, email varchar, phone varchar, address text, points listdouble, stargate text ); with indexing as: CREATE CUSTOM INDEX person_idx ON PERSON(stargate) USING some 'com.tuplejump.stargate.RowIndex' WITH options = { 'sg_options':'{ fields:{ eyeColor:{}, age:{}, phone:{} } }' }; If I insert data in the table along with points list. The following query won't give any results (0 rows): SELECT * FROM RESULTS1 WHERE stargate ='{ filter: { type: range, field: x, lower: 0 } }'; I tried removing points listdouble from the table and it works i.e. same query will return results. Can somebody help me with this problem as I couldn't find much support from Stargate. Please note that I am using Cassandra 2.0.9 compatible with Stargate-core as given in link ( http://stargate-core.readthedocs.org/en/latest/quickstart.html). Thanks, Mehak On Wed, Mar 18, 2015 at 5:45 AM, Andres de la Peña adelap...@stratio.com wrote: Hi, With Stratio Cassandra you can create Lucene based indexes for multidimensional queries this way: ALTER TABLE images.results1 ADD lucene text ; CREATE CUSTOM INDEX lucene_idx ON images.results1 (lucene) USING 'com.stratio.cassandra.index.RowIndex' WITH OPTIONS = { 'refresh_seconds':'1', 'schema':'{ fields:{ image_caseid:{type:string}, x:{type:double}, y:{type:double} } } '}; Then you can perform the query using the dummy column: SELECT * FROM images.results1 WHERE lucene='{ filter:{type:boolean, must:[ {field:image_caseid, type:match, value:mehak}, {field:x, type:range, lower:100}, {field:y, type:range, lower:100} ]}}'; However, you can take advantage of partition key to route the query only to the nodes owning the data: SELECT * FROM images.results1 WHERE image_caseid='mehak' AND lucene='{ filter:{type:boolean, must:[ {field:x, type:range, lower:100}, {field:y, type:range, lower:100} ]}}'; Or, even better: SELECT * FROM images.results1 WHERE image_caseid='mehak' AND x100 AND lucene='{ filter:{field:y, type:range, lower:100}}'; Additionally, if your data are geospatial (latitude and longitude), soon you will can use the incoming spatial features. 2015-03-17 23:01 GMT+01:00 Mehak Mehta meme...@cs.stonybrook.edu: Sorry I gave you wrong table definition for query. Here a composite key of image_caseid, x and uuid which is unique. I have used x in clustering columns to query it. And used secondary index on y column. 1. Example *cqlsh:images CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x double, y double, submit_date timestamp, points listdouble, PRIMARY KEY ((image_caseid),x,uuid));* *cqlsh:images create index results1_y on results1(y);* In the below query you can see I have image_caseid as primary key which is filtered. Even then it is giving error that *No indexed columns present* *cqlsh:images select * from results1 where image_caseid='mehak' and x 100 and y100 order by image_caseid asc;* *code=2200 [Invalid query] message=No indexed columns present in by-columns clause with Equal operator* 2. Example I also tried including both x and y columns as composite key even then query gives following error: *cqlsh:images CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x double, y double, submit_date timestamp, points listdouble, PRIMARY KEY ((image_caseid),x,y,uuid));* *cqlsh:images select * from results1 where image_caseid='mehak' and x 100 and y100 order by image_caseid asc;* *code=2200 [Invalid query] message=PRIMARY KEY column y cannot be restricted (preceding column ColumnDefinition{name=x, type=org.apache.cassandra.db.marshal.DoubleType, kind=CLUSTERING_COLUMN, componentIndex=0, indexName=null, indexType=null} is either not restricted or by a non-EQ relation)* Thanks, Mehak On Tue, Mar 17, 2015 at 5:19 PM, Jack Krupansky jack.krupan...@gmail.com wrote: Yeah, you may have to add a dummy column populated with a constant,
Re: 2d or multi dimension range query in cassandra CQL
Hi, I checked Startio Cassandra but couldn't get any good documentation for the same. Can you give me some pointers on how to use it. Do I have to build it from the source or I can use it directly with jar files as in case of Stargate. Since I was looking for solution which I don't need a full build and can be used with existing tar of cassandra because I have some restrictions on installing stuff on my server. Thanks, Mehak On Mon, Mar 23, 2015 at 8:17 AM, Asit KAUSHIK asitkaushikno...@gmail.com wrote: i am using Startio Cassandra it way better than stargate as it works on the latest release of Cassandra and is better on my performance. we are using it for fulltext search use case Regards Asit On Sun, Mar 22, 2015 at 12:14 PM, Mehak Mehta meme...@cs.stonybrook.edu wrote: Hi, On the basis of some suggestions, I tried using tuplejump for multidimensional queries. Since other mostly needed root permissions (for building ) which I don't have on my cluster account. I found a major problem in tuplejump (stargate-core). When I am using it with a list type field in my table. It stops working. For e.g. create table person ( id int primary key, isActive boolean, age int, eyeColor varchar, name text, gender varchar, company varchar, email varchar, phone varchar, address text, points listdouble, stargate text ); with indexing as: CREATE CUSTOM INDEX person_idx ON PERSON(stargate) USING some 'com.tuplejump.stargate.RowIndex' WITH options = { 'sg_options':'{ fields:{ eyeColor:{}, age:{}, phone:{} } }' }; If I insert data in the table along with points list. The following query won't give any results (0 rows): SELECT * FROM RESULTS1 WHERE stargate ='{ filter: { type: range, field: x, lower: 0 } }'; I tried removing points listdouble from the table and it works i.e. same query will return results. Can somebody help me with this problem as I couldn't find much support from Stargate. Please note that I am using Cassandra 2.0.9 compatible with Stargate-core as given in link ( http://stargate-core.readthedocs.org/en/latest/quickstart.html). Thanks, Mehak On Wed, Mar 18, 2015 at 5:45 AM, Andres de la Peña adelap...@stratio.com wrote: Hi, With Stratio Cassandra you can create Lucene based indexes for multidimensional queries this way: ALTER TABLE images.results1 ADD lucene text ; CREATE CUSTOM INDEX lucene_idx ON images.results1 (lucene) USING 'com.stratio.cassandra.index.RowIndex' WITH OPTIONS = { 'refresh_seconds':'1', 'schema':'{ fields:{ image_caseid:{type:string}, x:{type:double}, y:{type:double} } } '}; Then you can perform the query using the dummy column: SELECT * FROM images.results1 WHERE lucene='{ filter:{type:boolean, must:[ {field:image_caseid, type:match, value:mehak}, {field:x, type:range, lower:100}, {field:y, type:range, lower:100} ]}}'; However, you can take advantage of partition key to route the query only to the nodes owning the data: SELECT * FROM images.results1 WHERE image_caseid='mehak' AND lucene='{ filter:{type:boolean, must:[ {field:x, type:range, lower:100}, {field:y, type:range, lower:100} ]}}'; Or, even better: SELECT * FROM images.results1 WHERE image_caseid='mehak' AND x100 AND lucene='{ filter:{field:y, type:range, lower:100}}'; Additionally, if your data are geospatial (latitude and longitude), soon you will can use the incoming spatial features. 2015-03-17 23:01 GMT+01:00 Mehak Mehta meme...@cs.stonybrook.edu: Sorry I gave you wrong table definition for query. Here a composite key of image_caseid, x and uuid which is unique. I have used x in clustering columns to query it. And used secondary index on y column. 1. Example *cqlsh:images CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x double, y double, submit_date timestamp, points listdouble, PRIMARY KEY ((image_caseid),x,uuid));* *cqlsh:images create index results1_y on results1(y);* In the below query you can see I have image_caseid as primary key which is filtered. Even then it is giving error that *No indexed columns present* *cqlsh:images select * from results1 where image_caseid='mehak' and x 100 and y100 order by image_caseid asc;* *code=2200 [Invalid query] message=No indexed columns present in by-columns clause with Equal operator* 2. Example I also tried including both x and y columns as composite key even then query gives following error: *cqlsh:images CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x double, y double, submit_date timestamp, points listdouble, PRIMARY KEY ((image_caseid),x,y,uuid));* *cqlsh:images select * from results1 where
Re: 2d or multi dimension range query in cassandra CQL
Hi, You can download Stratio Cassandra binaries from https://s3.amazonaws.com/stratioorg/cassandra/stratio-cassandra-2.1.3.1-bin.tar.gz You can get info about how to build and getting started at its README file https://github.com/Stratio/stratio-cassandra/blob/master/README.md. More detailed info can be found at https://github.com/Stratio/stratio-cassandra/blob/master/doc/stratio/extended-search-in-cassandra.md . Regards, 2015-03-23 18:07 GMT+01:00 Mehak Mehta meme...@cs.stonybrook.edu: Hi, I checked Startio Cassandra but couldn't get any good documentation for the same. Can you give me some pointers on how to use it. Do I have to build it from the source or I can use it directly with jar files as in case of Stargate. Since I was looking for solution which I don't need a full build and can be used with existing tar of cassandra because I have some restrictions on installing stuff on my server. Thanks, Mehak On Mon, Mar 23, 2015 at 8:17 AM, Asit KAUSHIK asitkaushikno...@gmail.com wrote: i am using Startio Cassandra it way better than stargate as it works on the latest release of Cassandra and is better on my performance. we are using it for fulltext search use case Regards Asit On Sun, Mar 22, 2015 at 12:14 PM, Mehak Mehta meme...@cs.stonybrook.edu wrote: Hi, On the basis of some suggestions, I tried using tuplejump for multidimensional queries. Since other mostly needed root permissions (for building ) which I don't have on my cluster account. I found a major problem in tuplejump (stargate-core). When I am using it with a list type field in my table. It stops working. For e.g. create table person ( id int primary key, isActive boolean, age int, eyeColor varchar, name text, gender varchar, company varchar, email varchar, phone varchar, address text, points listdouble, stargate text ); with indexing as: CREATE CUSTOM INDEX person_idx ON PERSON(stargate) USING some 'com.tuplejump.stargate.RowIndex' WITH options = { 'sg_options':'{ fields:{ eyeColor:{}, age:{}, phone:{} } }' }; If I insert data in the table along with points list. The following query won't give any results (0 rows): SELECT * FROM RESULTS1 WHERE stargate ='{ filter: { type: range, field: x, lower: 0 } }'; I tried removing points listdouble from the table and it works i.e. same query will return results. Can somebody help me with this problem as I couldn't find much support from Stargate. Please note that I am using Cassandra 2.0.9 compatible with Stargate-core as given in link ( http://stargate-core.readthedocs.org/en/latest/quickstart.html). Thanks, Mehak On Wed, Mar 18, 2015 at 5:45 AM, Andres de la Peña adelap...@stratio.com wrote: Hi, With Stratio Cassandra you can create Lucene based indexes for multidimensional queries this way: ALTER TABLE images.results1 ADD lucene text ; CREATE CUSTOM INDEX lucene_idx ON images.results1 (lucene) USING 'com.stratio.cassandra.index.RowIndex' WITH OPTIONS = { 'refresh_seconds':'1', 'schema':'{ fields:{ image_caseid:{type:string}, x:{type:double}, y:{type:double} } } '}; Then you can perform the query using the dummy column: SELECT * FROM images.results1 WHERE lucene='{ filter:{type:boolean, must:[ {field:image_caseid, type:match, value:mehak}, {field:x, type:range, lower:100}, {field:y, type:range, lower:100} ]}}'; However, you can take advantage of partition key to route the query only to the nodes owning the data: SELECT * FROM images.results1 WHERE image_caseid='mehak' AND lucene='{ filter:{type:boolean, must:[ {field:x, type:range, lower:100}, {field:y, type:range, lower:100} ]}}'; Or, even better: SELECT * FROM images.results1 WHERE image_caseid='mehak' AND x100 AND lucene='{ filter:{field:y, type:range, lower:100}}'; Additionally, if your data are geospatial (latitude and longitude), soon you will can use the incoming spatial features. 2015-03-17 23:01 GMT+01:00 Mehak Mehta meme...@cs.stonybrook.edu: Sorry I gave you wrong table definition for query. Here a composite key of image_caseid, x and uuid which is unique. I have used x in clustering columns to query it. And used secondary index on y column. 1. Example *cqlsh:images CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x double, y double, submit_date timestamp, points listdouble, PRIMARY KEY ((image_caseid),x,uuid));* *cqlsh:images create index results1_y on results1(y);* In the below query you can see I have image_caseid as primary key which is filtered. Even then it is giving error that *No indexed columns present* *cqlsh:images select * from results1 where image_caseid='mehak' and x 100 and y100 order by image_caseid asc;*
Re: 2d or multi dimension range query in cassandra CQL
1. Create multiple secondary indexes, one for each non-key column you need to index on. Not recommended. Considered an anti-pattern for Cassandra. 2. Use DSE Search/Solr. 3. Use Lucene-based indexing with TumpleJump/Stargate or Stratio. -- Jack Krupansky On Tue, Mar 17, 2015 at 1:49 PM, Mehak Mehta meme...@cs.stonybrook.edu wrote: Hi, I want to perform range queries (as in x and y ranges) on a large data billions of rows. CQL allows me to put Non EQ restrictions on only one of the clustering columns. Its not allowing me to filter the data using any other column even with use of Allow Filtering option. cqlsh:images *select * from results1 where image_caseid='mehak' and x 100 and y 100 order by x allow filtering;* *code=2200 [Invalid query] message=No indexed columns present in by-columns clause with Equal operator* My table definition is : *CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x double, y double, loc varchar, w double, h double, normalized varchar, type varchar, filehost varchar, filename varchar, image_uuid uuid, image_uri varchar, image_caseid varchar, image_mpp_x double, image_mpp_y double, image_width double, image_height double, objective double, cancer_type varchar, Area float, submit_date timestamp, points listdouble, PRIMARY KEY ((image_caseid),x));* Kindly suggest me how to model is data and perform 2d or in general multi dimension range queries in Cassandra efficiently. Thanks, Mehak
Re: 2d or multi dimension range query in cassandra CQL
Lucene or Solr is better. I’m using lucene 在 2015年3月18日,上午2:11,Jack Krupansky jack.krupan...@gmail.com 写道: 1. Create multiple secondary indexes, one for each non-key column you need to index on. Not recommended. Considered an anti-pattern for Cassandra. 2. Use DSE Search/Solr. 3. Use Lucene-based indexing with TumpleJump/Stargate or Stratio. -- Jack Krupansky On Tue, Mar 17, 2015 at 1:49 PM, Mehak Mehta meme...@cs.stonybrook.edu mailto:meme...@cs.stonybrook.edu wrote: Hi, I want to perform range queries (as in x and y ranges) on a large data billions of rows. CQL allows me to put Non EQ restrictions on only one of the clustering columns. Its not allowing me to filter the data using any other column even with use of Allow Filtering option. cqlsh:images select * from results1 where image_caseid='mehak' and x 100 and y 100 order by x allow filtering; code=2200 [Invalid query] message=No indexed columns present in by-columns clause with Equal operator My table definition is : CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x double, y double, loc varchar, w double, h double, normalized varchar, type varchar, filehost varchar, filename varchar, image_uuid uuid, image_uri varchar, image_caseid varchar, image_mpp_x double, image_mpp_y double, image_width double, image_height double, objective double, cancer_type varchar, Area float, submit_date timestamp, points listdouble, PRIMARY KEY ((image_caseid),x)); Kindly suggest me how to model is data and perform 2d or in general multi dimension range queries in Cassandra efficiently. Thanks, Mehak
2d or multi dimension range query in cassandra CQL
Hi, I want to perform range queries (as in x and y ranges) on a large data billions of rows. CQL allows me to put Non EQ restrictions on only one of the clustering columns. Its not allowing me to filter the data using any other column even with use of Allow Filtering option. cqlsh:images *select * from results1 where image_caseid='mehak' and x 100 and y 100 order by x allow filtering;* *code=2200 [Invalid query] message=No indexed columns present in by-columns clause with Equal operator* My table definition is : *CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x double, y double, loc varchar, w double, h double, normalized varchar, type varchar, filehost varchar, filename varchar, image_uuid uuid, image_uri varchar, image_caseid varchar, image_mpp_x double, image_mpp_y double, image_width double, image_height double, objective double, cancer_type varchar, Area float, submit_date timestamp, points listdouble, PRIMARY KEY ((image_caseid),x));* Kindly suggest me how to model is data and perform 2d or in general multi dimension range queries in Cassandra efficiently. Thanks, Mehak