Re: 2d or multi dimension range query in cassandra CQL

2015-03-23 Thread Asit KAUSHIK
i am using Startio Cassandra it way better than stargate as it works on the
latest release of Cassandra and is better on my performance.

we are using it for fulltext search use case

Regards
Asit

On Sun, Mar 22, 2015 at 12:14 PM, Mehak Mehta meme...@cs.stonybrook.edu
wrote:

 Hi,

 On the basis of some suggestions, I tried using tuplejump for multidimensional
 queries. Since other mostly needed root permissions (for building ) which I
 don't have on my cluster account.

 I found a major problem in tuplejump (stargate-core). When I am using it
 with a list type field in my table. It stops working.
 For e.g.

 create table person (
 id int primary key,
 isActive boolean,
 age int,
 eyeColor varchar,
 name text,
 gender varchar,
 company varchar,
 email varchar,
 phone varchar,
 address text,
 points listdouble,
 stargate text
 );

 with indexing as:
 CREATE CUSTOM INDEX person_idx ON PERSON(stargate) USING some
 'com.tuplejump.stargate.RowIndex' WITH options =
 {
 'sg_options':'{
 fields:{
 eyeColor:{},
 age:{},
 phone:{}
 }
 }'
 };

 If I insert data in the table along with points list. The following query
 won't give any results (0 rows):

 SELECT * FROM RESULTS1 WHERE stargate ='{
 filter: {
 type: range,
 field: x,
 lower: 0
 }
 }';

 I tried removing points listdouble from the table and it works i.e.
 same query will return results.
 Can somebody help me with this problem as I couldn't find much support
 from Stargate.

 Please note that I am using Cassandra 2.0.9 compatible with Stargate-core
 as given in link (
 http://stargate-core.readthedocs.org/en/latest/quickstart.html).

 Thanks,
 Mehak


 On Wed, Mar 18, 2015 at 5:45 AM, Andres de la Peña adelap...@stratio.com
 wrote:

 Hi,

 With Stratio Cassandra you can create Lucene based indexes for
 multidimensional queries this way:

 ALTER TABLE images.results1 ADD lucene text ;

 CREATE CUSTOM INDEX lucene_idx ON images.results1 (lucene)
 USING 'com.stratio.cassandra.index.RowIndex'
 WITH OPTIONS = {
  'refresh_seconds':'1',
  'schema':'{
   fields:{
   image_caseid:{type:string},
 x:{type:double},
 y:{type:double} } } '};

 Then you can perform the query using the dummy column:

 SELECT * FROM images.results1 WHERE lucene='{ filter:{type:boolean,
 must:[
 {field:image_caseid, type:match, value:mehak},
 {field:x, type:range, lower:100},
 {field:y, type:range, lower:100}
 ]}}';

 However, you can take advantage of partition key to route the query only
 to the nodes owning the data:

 SELECT * FROM images.results1 WHERE image_caseid='mehak' AND lucene='{
 filter:{type:boolean, must:[
 {field:x, type:range, lower:100},
 {field:y, type:range, lower:100}
 ]}}';

 Or, even better:

 SELECT * FROM images.results1 WHERE image_caseid='mehak' AND x100 AND
 lucene='{ filter:{field:y, type:range, lower:100}}';

 Additionally, if your data are geospatial (latitude and longitude), soon
 you will can use the incoming spatial features.



 2015-03-17 23:01 GMT+01:00 Mehak Mehta meme...@cs.stonybrook.edu:

 Sorry I gave you wrong table definition for query. Here a composite key
 of image_caseid, x and uuid which is unique. I have used x in clustering
 columns to query it. And used secondary index on y column.

 1. Example
 *cqlsh:images CREATE TABLE images.results1 (uuid uuid,
 analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y
 double, submit_date timestamp, points listdouble,  PRIMARY KEY
 ((image_caseid),x,uuid));*
 *cqlsh:images create index results1_y on results1(y);*

 In the below query you can see I have image_caseid as primary key which
 is filtered. Even then it is giving error that *No indexed columns
 present*

 *cqlsh:images select * from results1 where image_caseid='mehak' and x 
 100 and y100 order by image_caseid asc;*
 *code=2200 [Invalid query] message=No indexed columns present in
 by-columns clause with Equal operator*

 2. Example
 I also tried including both x and y columns as composite key even then
 query gives following error:

 *cqlsh:images CREATE TABLE images.results1 (uuid uuid,
 analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y
 double, submit_date timestamp, points listdouble,  PRIMARY KEY
 ((image_caseid),x,y,uuid));*

 *cqlsh:images select * from results1 where image_caseid='mehak' and x 
 100 and y100 order by image_caseid asc;*
 *code=2200 [Invalid query] message=PRIMARY KEY column y cannot be
 restricted (preceding column ColumnDefinition{name=x,
 type=org.apache.cassandra.db.marshal.DoubleType, kind=CLUSTERING_COLUMN,
 componentIndex=0, indexName=null, indexType=null} is either not restricted
 or by a non-EQ relation)*

 Thanks,
 Mehak


 On Tue, Mar 17, 2015 at 5:19 PM, Jack Krupansky 
 jack.krupan...@gmail.com wrote:

 Yeah, you may have to add a dummy column populated with a constant, 

Re: 2d or multi dimension range query in cassandra CQL

2015-03-23 Thread Mehak Mehta
Hi,

I checked Startio Cassandra but couldn't get any good documentation for the
same.
Can you give me some pointers on how to use it.
Do I have to build it from the source or I can use it directly with jar
files as in case of Stargate.
Since I was looking for solution which I don't need a full build and can be
used with existing tar of cassandra because I have some restrictions on
installing stuff on my server.

Thanks,
Mehak

On Mon, Mar 23, 2015 at 8:17 AM, Asit KAUSHIK asitkaushikno...@gmail.com
wrote:

 i am using Startio Cassandra it way better than stargate as it works on
 the latest release of Cassandra and is better on my performance.

 we are using it for fulltext search use case

 Regards
 Asit

 On Sun, Mar 22, 2015 at 12:14 PM, Mehak Mehta meme...@cs.stonybrook.edu
 wrote:

 Hi,

 On the basis of some suggestions, I tried using tuplejump for 
 multidimensional
 queries. Since other mostly needed root permissions (for building ) which I
 don't have on my cluster account.

 I found a major problem in tuplejump (stargate-core). When I am using it
 with a list type field in my table. It stops working.
 For e.g.

 create table person (
 id int primary key,
 isActive boolean,
 age int,
 eyeColor varchar,
 name text,
 gender varchar,
 company varchar,
 email varchar,
 phone varchar,
 address text,
 points listdouble,
 stargate text
 );

 with indexing as:
 CREATE CUSTOM INDEX person_idx ON PERSON(stargate) USING some
 'com.tuplejump.stargate.RowIndex' WITH options =
 {
 'sg_options':'{
 fields:{
 eyeColor:{},
 age:{},
 phone:{}
 }
 }'
 };

 If I insert data in the table along with points list. The following query
 won't give any results (0 rows):

 SELECT * FROM RESULTS1 WHERE stargate ='{
 filter: {
 type: range,
 field: x,
 lower: 0
 }
 }';

 I tried removing points listdouble from the table and it works i.e.
 same query will return results.
 Can somebody help me with this problem as I couldn't find much support
 from Stargate.

 Please note that I am using Cassandra 2.0.9 compatible with Stargate-core
 as given in link (
 http://stargate-core.readthedocs.org/en/latest/quickstart.html).

 Thanks,
 Mehak


 On Wed, Mar 18, 2015 at 5:45 AM, Andres de la Peña adelap...@stratio.com
  wrote:

 Hi,

 With Stratio Cassandra you can create Lucene based indexes for
 multidimensional queries this way:

 ALTER TABLE images.results1 ADD lucene text ;

 CREATE CUSTOM INDEX lucene_idx ON images.results1 (lucene)
 USING 'com.stratio.cassandra.index.RowIndex'
 WITH OPTIONS = {
  'refresh_seconds':'1',
  'schema':'{
   fields:{
   image_caseid:{type:string},
 x:{type:double},
 y:{type:double} } } '};

 Then you can perform the query using the dummy column:

 SELECT * FROM images.results1 WHERE lucene='{ filter:{type:boolean,
 must:[
 {field:image_caseid, type:match, value:mehak},
 {field:x, type:range, lower:100},
 {field:y, type:range, lower:100}
 ]}}';

 However, you can take advantage of partition key to route the query only
 to the nodes owning the data:

 SELECT * FROM images.results1 WHERE image_caseid='mehak' AND lucene='{
 filter:{type:boolean, must:[
 {field:x, type:range, lower:100},
 {field:y, type:range, lower:100}
 ]}}';

 Or, even better:

 SELECT * FROM images.results1 WHERE image_caseid='mehak' AND x100 AND
 lucene='{ filter:{field:y, type:range, lower:100}}';

 Additionally, if your data are geospatial (latitude and longitude), soon
 you will can use the incoming spatial features.



 2015-03-17 23:01 GMT+01:00 Mehak Mehta meme...@cs.stonybrook.edu:

 Sorry I gave you wrong table definition for query. Here a composite key
 of image_caseid, x and uuid which is unique. I have used x in clustering
 columns to query it. And used secondary index on y column.

 1. Example
 *cqlsh:images CREATE TABLE images.results1 (uuid uuid,
 analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y
 double, submit_date timestamp, points listdouble,  PRIMARY KEY
 ((image_caseid),x,uuid));*
 *cqlsh:images create index results1_y on results1(y);*

 In the below query you can see I have image_caseid as primary key which
 is filtered. Even then it is giving error that *No indexed columns
 present*

 *cqlsh:images select * from results1 where image_caseid='mehak' and x
  100 and y100 order by image_caseid asc;*
 *code=2200 [Invalid query] message=No indexed columns present in
 by-columns clause with Equal operator*

 2. Example
 I also tried including both x and y columns as composite key even then
 query gives following error:

 *cqlsh:images CREATE TABLE images.results1 (uuid uuid,
 analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y
 double, submit_date timestamp, points listdouble,  PRIMARY KEY
 ((image_caseid),x,y,uuid));*

 *cqlsh:images select * from results1 where 

Re: 2d or multi dimension range query in cassandra CQL

2015-03-23 Thread Andres de la Peña
Hi,

You can download Stratio Cassandra binaries from
https://s3.amazonaws.com/stratioorg/cassandra/stratio-cassandra-2.1.3.1-bin.tar.gz

You can get info about how to build and getting started at its README file
https://github.com/Stratio/stratio-cassandra/blob/master/README.md. More
detailed info can be found at
https://github.com/Stratio/stratio-cassandra/blob/master/doc/stratio/extended-search-in-cassandra.md
.

Regards,

2015-03-23 18:07 GMT+01:00 Mehak Mehta meme...@cs.stonybrook.edu:

 Hi,

 I checked Startio Cassandra but couldn't get any good documentation for
 the same.
 Can you give me some pointers on how to use it.
 Do I have to build it from the source or I can use it directly with jar
 files as in case of Stargate.
 Since I was looking for solution which I don't need a full build and can
 be used with existing tar of cassandra because I have some restrictions on
 installing stuff on my server.

 Thanks,
 Mehak

 On Mon, Mar 23, 2015 at 8:17 AM, Asit KAUSHIK asitkaushikno...@gmail.com
 wrote:

 i am using Startio Cassandra it way better than stargate as it works on
 the latest release of Cassandra and is better on my performance.

 we are using it for fulltext search use case

 Regards
 Asit

 On Sun, Mar 22, 2015 at 12:14 PM, Mehak Mehta meme...@cs.stonybrook.edu
 wrote:

 Hi,

 On the basis of some suggestions, I tried using tuplejump for 
 multidimensional
 queries. Since other mostly needed root permissions (for building ) which I
 don't have on my cluster account.

 I found a major problem in tuplejump (stargate-core). When I am using it
 with a list type field in my table. It stops working.
 For e.g.

 create table person (
 id int primary key,
 isActive boolean,
 age int,
 eyeColor varchar,
 name text,
 gender varchar,
 company varchar,
 email varchar,
 phone varchar,
 address text,
 points listdouble,
 stargate text
 );

 with indexing as:
 CREATE CUSTOM INDEX person_idx ON PERSON(stargate) USING some
 'com.tuplejump.stargate.RowIndex' WITH options =
 {
 'sg_options':'{
 fields:{
 eyeColor:{},
 age:{},
 phone:{}
 }
 }'
 };

 If I insert data in the table along with points list. The following
 query won't give any results (0 rows):

 SELECT * FROM RESULTS1 WHERE stargate ='{
 filter: {
 type: range,
 field: x,
 lower: 0
 }
 }';

 I tried removing points listdouble from the table and it works i.e.
 same query will return results.
 Can somebody help me with this problem as I couldn't find much support
 from Stargate.

 Please note that I am using Cassandra 2.0.9 compatible with
 Stargate-core as given in link (
 http://stargate-core.readthedocs.org/en/latest/quickstart.html).

 Thanks,
 Mehak


 On Wed, Mar 18, 2015 at 5:45 AM, Andres de la Peña 
 adelap...@stratio.com wrote:

 Hi,

 With Stratio Cassandra you can create Lucene based indexes for
 multidimensional queries this way:

 ALTER TABLE images.results1 ADD lucene text ;

 CREATE CUSTOM INDEX lucene_idx ON images.results1 (lucene)
 USING 'com.stratio.cassandra.index.RowIndex'
 WITH OPTIONS = {
  'refresh_seconds':'1',
  'schema':'{
   fields:{
   image_caseid:{type:string},
 x:{type:double},
 y:{type:double} } } '};

 Then you can perform the query using the dummy column:

 SELECT * FROM images.results1 WHERE lucene='{ filter:{type:boolean,
 must:[
 {field:image_caseid, type:match, value:mehak},
 {field:x, type:range, lower:100},
 {field:y, type:range, lower:100}
 ]}}';

 However, you can take advantage of partition key to route the query
 only to the nodes owning the data:

 SELECT * FROM images.results1 WHERE image_caseid='mehak' AND lucene='{
 filter:{type:boolean, must:[
 {field:x, type:range, lower:100},
 {field:y, type:range, lower:100}
 ]}}';

 Or, even better:

 SELECT * FROM images.results1 WHERE image_caseid='mehak' AND x100 AND
 lucene='{ filter:{field:y, type:range, lower:100}}';

 Additionally, if your data are geospatial (latitude and longitude),
 soon you will can use the incoming spatial features.



 2015-03-17 23:01 GMT+01:00 Mehak Mehta meme...@cs.stonybrook.edu:

 Sorry I gave you wrong table definition for query. Here a composite
 key of image_caseid, x and uuid which is unique. I have used x in
 clustering columns to query it. And used secondary index on y column.

 1. Example
 *cqlsh:images CREATE TABLE images.results1 (uuid uuid,
 analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y
 double, submit_date timestamp, points listdouble,  PRIMARY KEY
 ((image_caseid),x,uuid));*
 *cqlsh:images create index results1_y on results1(y);*

 In the below query you can see I have image_caseid as primary key
 which is filtered. Even then it is giving error that *No indexed
 columns present*

 *cqlsh:images select * from results1 where image_caseid='mehak' and x
  100 and y100 order by image_caseid asc;*

Re: 2d or multi dimension range query in cassandra CQL

2015-03-17 Thread Jack Krupansky
1. Create multiple secondary indexes, one for each non-key column you need
to index on. Not recommended. Considered an anti-pattern for Cassandra.
2. Use DSE Search/Solr.
3. Use Lucene-based indexing with TumpleJump/Stargate or Stratio.

-- Jack Krupansky

On Tue, Mar 17, 2015 at 1:49 PM, Mehak Mehta meme...@cs.stonybrook.edu
wrote:

 Hi,

 I want to perform range queries (as in x and y ranges) on a large data
 billions of rows.
 CQL allows me to put Non EQ restrictions on only one of the clustering
 columns.
 Its not allowing me to filter the data using any other column even with
 use of Allow Filtering option.

 cqlsh:images *select * from results1 where image_caseid='mehak' and x 
 100 and y 100 order by x allow filtering;*
 *code=2200 [Invalid query] message=No indexed columns present in
 by-columns clause with Equal operator*

 My table definition is :

 *CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar,
 analysis_execution_uuid uuid, x  double, y double, loc varchar, w double, h
 double, normalized varchar, type varchar, filehost varchar, filename
 varchar, image_uuid uuid, image_uri varchar, image_caseid varchar,
 image_mpp_x double, image_mpp_y double, image_width double, image_height
 double, objective double, cancer_type varchar,  Area float, submit_date
 timestamp, points listdouble,  PRIMARY KEY ((image_caseid),x));*

 Kindly suggest me how to model is data and perform 2d or in general multi
 dimension range queries in Cassandra efficiently.


 Thanks,
 Mehak



Re: 2d or multi dimension range query in cassandra CQL

2015-03-17 Thread 健峰 高
Lucene or Solr is better. I’m using lucene
 在 2015年3月18日,上午2:11,Jack Krupansky jack.krupan...@gmail.com 写道:
 
 1. Create multiple secondary indexes, one for each non-key column you need to 
 index on. Not recommended. Considered an anti-pattern for Cassandra.
 2. Use DSE Search/Solr.
 3. Use Lucene-based indexing with TumpleJump/Stargate or Stratio.
 
 -- Jack Krupansky
 
 On Tue, Mar 17, 2015 at 1:49 PM, Mehak Mehta meme...@cs.stonybrook.edu 
 mailto:meme...@cs.stonybrook.edu wrote:
 Hi,
 
 I want to perform range queries (as in x and y ranges) on a large data 
 billions of rows. 
 CQL allows me to put Non EQ restrictions on only one of the clustering 
 columns. 
 Its not allowing me to filter the data using any other column even with use 
 of Allow Filtering option.
 
 cqlsh:images select * from results1 where image_caseid='mehak' and x  100 
 and y 100 order by x allow filtering;
 code=2200 [Invalid query] message=No indexed columns present in by-columns 
 clause with Equal operator
 
 My table definition is :
 
 CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, 
 analysis_execution_uuid uuid, x  double, y double, loc varchar, w double, h 
 double, normalized varchar, type varchar, filehost varchar, filename varchar, 
 image_uuid uuid, image_uri varchar, image_caseid varchar, image_mpp_x double, 
 image_mpp_y double, image_width double, image_height double, objective 
 double, cancer_type varchar,  Area float, submit_date timestamp, points 
 listdouble,  PRIMARY KEY ((image_caseid),x));
 
 Kindly suggest me how to model is data and perform 2d or in general multi 
 dimension range queries in Cassandra efficiently.
 
 
 Thanks,
 Mehak
 



2d or multi dimension range query in cassandra CQL

2015-03-17 Thread Mehak Mehta
Hi,

I want to perform range queries (as in x and y ranges) on a large data
billions of rows.
CQL allows me to put Non EQ restrictions on only one of the clustering
columns.
Its not allowing me to filter the data using any other column even with use
of Allow Filtering option.

cqlsh:images *select * from results1 where image_caseid='mehak' and x 
100 and y 100 order by x allow filtering;*
*code=2200 [Invalid query] message=No indexed columns present in
by-columns clause with Equal operator*

My table definition is :

*CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar,
analysis_execution_uuid uuid, x  double, y double, loc varchar, w double, h
double, normalized varchar, type varchar, filehost varchar, filename
varchar, image_uuid uuid, image_uri varchar, image_caseid varchar,
image_mpp_x double, image_mpp_y double, image_width double, image_height
double, objective double, cancer_type varchar,  Area float, submit_date
timestamp, points listdouble,  PRIMARY KEY ((image_caseid),x));*

Kindly suggest me how to model is data and perform 2d or in general multi
dimension range queries in Cassandra efficiently.


Thanks,
Mehak