Re: write timeout

2015-03-23 Thread Anishek Agarwal
Forgot to mention I am using Cassandra 2.0.13

On Mon, Mar 23, 2015 at 5:59 PM, Anishek Agarwal anis...@gmail.com wrote:

 Hello,

 I am using a single node  server class machine with 16 CPUs with 32GB RAM
 with a single drive attached to it.

 my table structure is as below

 CREATE TABLE t1(id bigint, ts timestamp, cat1 settext, cat2 settext, lat 
 float, lon float, a bigint, primary key (id, ts));

 I am trying to insert 300 entries per partition key with 4000 partition
 keys using 25 threads. Configurations

 write_request_timeout_in_ms: 5000
 concurrent_writes: 32
 heap space : 8GB

 Client side timeout is 12 sec using datastax java driver.
 Consistency level: ONE

 With the above configuration i try to run it 10 times to eventually
 generate around

 300 * 4000 * 10 = 1200 entries,

 When i run this after the first few runs i get a WriteTimeout exception at
 client with 1 replica were required but only 0 acknowledged the write
 message.

 There are no errors in server log. Why does this error come how do i know
 what is the limit I should limit concurrent writes to a single node to.


 Looking at iostat disk utilization seems to be at 1-3% when running this.

 Please let me know if anything else is required.

 Regards,
 Anishek




Re: 2d or multi dimension range query in cassandra CQL

2015-03-23 Thread Asit KAUSHIK
i am using Startio Cassandra it way better than stargate as it works on the
latest release of Cassandra and is better on my performance.

we are using it for fulltext search use case

Regards
Asit

On Sun, Mar 22, 2015 at 12:14 PM, Mehak Mehta meme...@cs.stonybrook.edu
wrote:

 Hi,

 On the basis of some suggestions, I tried using tuplejump for multidimensional
 queries. Since other mostly needed root permissions (for building ) which I
 don't have on my cluster account.

 I found a major problem in tuplejump (stargate-core). When I am using it
 with a list type field in my table. It stops working.
 For e.g.

 create table person (
 id int primary key,
 isActive boolean,
 age int,
 eyeColor varchar,
 name text,
 gender varchar,
 company varchar,
 email varchar,
 phone varchar,
 address text,
 points listdouble,
 stargate text
 );

 with indexing as:
 CREATE CUSTOM INDEX person_idx ON PERSON(stargate) USING some
 'com.tuplejump.stargate.RowIndex' WITH options =
 {
 'sg_options':'{
 fields:{
 eyeColor:{},
 age:{},
 phone:{}
 }
 }'
 };

 If I insert data in the table along with points list. The following query
 won't give any results (0 rows):

 SELECT * FROM RESULTS1 WHERE stargate ='{
 filter: {
 type: range,
 field: x,
 lower: 0
 }
 }';

 I tried removing points listdouble from the table and it works i.e.
 same query will return results.
 Can somebody help me with this problem as I couldn't find much support
 from Stargate.

 Please note that I am using Cassandra 2.0.9 compatible with Stargate-core
 as given in link (
 http://stargate-core.readthedocs.org/en/latest/quickstart.html).

 Thanks,
 Mehak


 On Wed, Mar 18, 2015 at 5:45 AM, Andres de la Peña adelap...@stratio.com
 wrote:

 Hi,

 With Stratio Cassandra you can create Lucene based indexes for
 multidimensional queries this way:

 ALTER TABLE images.results1 ADD lucene text ;

 CREATE CUSTOM INDEX lucene_idx ON images.results1 (lucene)
 USING 'com.stratio.cassandra.index.RowIndex'
 WITH OPTIONS = {
  'refresh_seconds':'1',
  'schema':'{
   fields:{
   image_caseid:{type:string},
 x:{type:double},
 y:{type:double} } } '};

 Then you can perform the query using the dummy column:

 SELECT * FROM images.results1 WHERE lucene='{ filter:{type:boolean,
 must:[
 {field:image_caseid, type:match, value:mehak},
 {field:x, type:range, lower:100},
 {field:y, type:range, lower:100}
 ]}}';

 However, you can take advantage of partition key to route the query only
 to the nodes owning the data:

 SELECT * FROM images.results1 WHERE image_caseid='mehak' AND lucene='{
 filter:{type:boolean, must:[
 {field:x, type:range, lower:100},
 {field:y, type:range, lower:100}
 ]}}';

 Or, even better:

 SELECT * FROM images.results1 WHERE image_caseid='mehak' AND x100 AND
 lucene='{ filter:{field:y, type:range, lower:100}}';

 Additionally, if your data are geospatial (latitude and longitude), soon
 you will can use the incoming spatial features.



 2015-03-17 23:01 GMT+01:00 Mehak Mehta meme...@cs.stonybrook.edu:

 Sorry I gave you wrong table definition for query. Here a composite key
 of image_caseid, x and uuid which is unique. I have used x in clustering
 columns to query it. And used secondary index on y column.

 1. Example
 *cqlsh:images CREATE TABLE images.results1 (uuid uuid,
 analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y
 double, submit_date timestamp, points listdouble,  PRIMARY KEY
 ((image_caseid),x,uuid));*
 *cqlsh:images create index results1_y on results1(y);*

 In the below query you can see I have image_caseid as primary key which
 is filtered. Even then it is giving error that *No indexed columns
 present*

 *cqlsh:images select * from results1 where image_caseid='mehak' and x 
 100 and y100 order by image_caseid asc;*
 *code=2200 [Invalid query] message=No indexed columns present in
 by-columns clause with Equal operator*

 2. Example
 I also tried including both x and y columns as composite key even then
 query gives following error:

 *cqlsh:images CREATE TABLE images.results1 (uuid uuid,
 analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y
 double, submit_date timestamp, points listdouble,  PRIMARY KEY
 ((image_caseid),x,y,uuid));*

 *cqlsh:images select * from results1 where image_caseid='mehak' and x 
 100 and y100 order by image_caseid asc;*
 *code=2200 [Invalid query] message=PRIMARY KEY column y cannot be
 restricted (preceding column ColumnDefinition{name=x,
 type=org.apache.cassandra.db.marshal.DoubleType, kind=CLUSTERING_COLUMN,
 componentIndex=0, indexName=null, indexType=null} is either not restricted
 or by a non-EQ relation)*

 Thanks,
 Mehak


 On Tue, Mar 17, 2015 at 5:19 PM, Jack Krupansky 
 jack.krupan...@gmail.com wrote:

 Yeah, you may have to add a dummy column populated with a constant, 

write timeout

2015-03-23 Thread Anishek Agarwal
Hello,

I am using a single node  server class machine with 16 CPUs with 32GB RAM
with a single drive attached to it.

my table structure is as below

CREATE TABLE t1(id bigint, ts timestamp, cat1 settext, cat2
settext, lat float, lon float, a bigint, primary key (id, ts));

I am trying to insert 300 entries per partition key with 4000 partition
keys using 25 threads. Configurations

write_request_timeout_in_ms: 5000
concurrent_writes: 32
heap space : 8GB

Client side timeout is 12 sec using datastax java driver.
Consistency level: ONE

With the above configuration i try to run it 10 times to eventually
generate around

300 * 4000 * 10 = 1200 entries,

When i run this after the first few runs i get a WriteTimeout exception at
client with 1 replica were required but only 0 acknowledged the write
message.

There are no errors in server log. Why does this error come how do i know
what is the limit I should limit concurrent writes to a single node to.


Looking at iostat disk utilization seems to be at 1-3% when running this.

Please let me know if anything else is required.

Regards,
Anishek


Re: cassandra triggers

2015-03-23 Thread Asit KAUSHIK
attached is the code . You follow the process for compiling and using the
code.

If anything more is required please let me know. The Jar file has to be put
into  /usr/share/cassandra/conf/triggers.
Hope this helps

Regards
asit


On Mon, Mar 23, 2015 at 3:20 PM, Rahul Bhardwaj 
rahul.bhard...@indiamart.com wrote:

 Yes Asit you can share it with me, let c if we can implement with our
 requirement.


 Regards:
 Rahul Bhardwaj

 On Mon, Mar 23, 2015 at 1:43 PM, Asit KAUSHIK asitkaushikno...@gmail.com
 wrote:

 Hi Rahul,

 i have created a trigger which inserts a default value into the table.
 But everyone are against using it. As its an external code which may be
 uncompatible in future releases. Its was a chnallenge as all the examples
 are of old 2.0.X veresion where RowMutable package is used which is
 discontinued in the later releases.

 If you still want the code i can give you . The application is same as on
 all the sites i used below for my reference. But again the code is for
 older release and would not work..

 http://noflex.org/learn-experiment-cassandra-trigger/


 On Mon, Mar 23, 2015 at 11:01 AM, Rahul Bhardwaj 
 rahul.bhard...@indiamart.com wrote:

 Hi All,


 I want to use triggers in cassandra. Is there any tutorial on creating
 triggers in cassandra .
 Also I am not good in java.

 Pl help !!

 Regards:
 Rahul Bhardwaj


 Follow IndiaMART.com http://www.indiamart.com for latest updates on
 this and more: https://plus.google.com/+indiamart
 https://www.facebook.com/IndiaMART https://twitter.com/IndiaMART
 Mobile Channel:
 https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641mt=8
 https://play.google.com/store/apps/details?id=com.indiamart.m
 http://m.indiamart.com/

 https://www.youtube.com/watch?v=DzORNbeSXN8list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1index=2
 Watch how IndiaMART Maximiser helped Mr. Khanna expand his business.
 kyunki Kaam Yahin Banta Hai
 https://www.youtube.com/watch?v=Q9fZ5ILY3w8feature=youtu.be!!!





 Follow IndiaMART.com http://www.indiamart.com for latest updates on
 this and more: https://plus.google.com/+indiamart
 https://www.facebook.com/IndiaMART https://twitter.com/IndiaMART
 Mobile Channel:
 https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641mt=8
 https://play.google.com/store/apps/details?id=com.indiamart.m
 http://m.indiamart.com/

 https://www.youtube.com/watch?v=DzORNbeSXN8list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1index=2
 Watch how IndiaMART Maximiser helped Mr. Khanna expand his business.
 kyunki Kaam Yahin Banta Hai
 https://www.youtube.com/watch?v=Q9fZ5ILY3w8feature=youtu.be!!!



InvertedIndex.java
Description: Binary data


Re: write timeout

2015-03-23 Thread Brian Tarbox
My group is seeing the same thing and also can not figure out why its
happening.

On Mon, Mar 23, 2015 at 8:36 AM, Anishek Agarwal anis...@gmail.com wrote:

 Forgot to mention I am using Cassandra 2.0.13

 On Mon, Mar 23, 2015 at 5:59 PM, Anishek Agarwal anis...@gmail.com
 wrote:

 Hello,

 I am using a single node  server class machine with 16 CPUs with 32GB RAM
 with a single drive attached to it.

 my table structure is as below

 CREATE TABLE t1(id bigint, ts timestamp, cat1 settext, cat2 settext, lat 
 float, lon float, a bigint, primary key (id, ts));

 I am trying to insert 300 entries per partition key with 4000 partition
 keys using 25 threads. Configurations

 write_request_timeout_in_ms: 5000
 concurrent_writes: 32
 heap space : 8GB

 Client side timeout is 12 sec using datastax java driver.
 Consistency level: ONE

 With the above configuration i try to run it 10 times to eventually
 generate around

 300 * 4000 * 10 = 1200 entries,

 When i run this after the first few runs i get a WriteTimeout exception
 at client with 1 replica were required but only 0 acknowledged the write
 message.

 There are no errors in server log. Why does this error come how do i know
 what is the limit I should limit concurrent writes to a single node to.


 Looking at iostat disk utilization seems to be at 1-3% when running this.

 Please let me know if anything else is required.

 Regards,
 Anishek





-- 
http://about.me/BrianTarbox


Re: cassandra triggers

2015-03-23 Thread Jason Wee
okay, if you leave a comment in the blog on what is breaking and what
cassandra, I can take a look at the code when I get the time. :-)

jason

On Mon, Mar 23, 2015 at 8:15 PM, Asit KAUSHIK asitkaushikno...@gmail.com
wrote:

 attached is the code . You follow the process for compiling and using the
 code.

 If anything more is required please let me know. The Jar file has to be
 put into  /usr/share/cassandra/conf/triggers.
 Hope this helps

 Regards
 asit


 On Mon, Mar 23, 2015 at 3:20 PM, Rahul Bhardwaj 
 rahul.bhard...@indiamart.com wrote:

 Yes Asit you can share it with me, let c if we can implement with our
 requirement.


 Regards:
 Rahul Bhardwaj

 On Mon, Mar 23, 2015 at 1:43 PM, Asit KAUSHIK asitkaushikno...@gmail.com
  wrote:

 Hi Rahul,

 i have created a trigger which inserts a default value into the table.
 But everyone are against using it. As its an external code which may be
 uncompatible in future releases. Its was a chnallenge as all the examples
 are of old 2.0.X veresion where RowMutable package is used which is
 discontinued in the later releases.

 If you still want the code i can give you . The application is same as
 on all the sites i used below for my reference. But again the code is for
 older release and would not work..

 http://noflex.org/learn-experiment-cassandra-trigger/


 On Mon, Mar 23, 2015 at 11:01 AM, Rahul Bhardwaj 
 rahul.bhard...@indiamart.com wrote:

 Hi All,


 I want to use triggers in cassandra. Is there any tutorial on creating
 triggers in cassandra .
 Also I am not good in java.

 Pl help !!

 Regards:
 Rahul Bhardwaj


 Follow IndiaMART.com http://www.indiamart.com for latest updates on
 this and more: https://plus.google.com/+indiamart
 https://www.facebook.com/IndiaMART https://twitter.com/IndiaMART
 Mobile Channel:
 https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641mt=8
 https://play.google.com/store/apps/details?id=com.indiamart.m
 http://m.indiamart.com/

 https://www.youtube.com/watch?v=DzORNbeSXN8list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1index=2
 Watch how IndiaMART Maximiser helped Mr. Khanna expand his business.
 kyunki Kaam Yahin Banta Hai
 https://www.youtube.com/watch?v=Q9fZ5ILY3w8feature=youtu.be!!!





 Follow IndiaMART.com http://www.indiamart.com for latest updates on
 this and more: https://plus.google.com/+indiamart
 https://www.facebook.com/IndiaMART https://twitter.com/IndiaMART
 Mobile Channel:
 https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641mt=8
 https://play.google.com/store/apps/details?id=com.indiamart.m
 http://m.indiamart.com/

 https://www.youtube.com/watch?v=DzORNbeSXN8list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1index=2
 Watch how IndiaMART Maximiser helped Mr. Khanna expand his business.
 kyunki Kaam Yahin Banta Hai
 https://www.youtube.com/watch?v=Q9fZ5ILY3w8feature=youtu.be!!!





Re: Unknown CF / Schema OK

2015-03-23 Thread Tim Olson
I did figure this out:

When adding a columnfamily, the query timed out before all nodes replied,
and I sent the schema out again.  Half the nodes ended up with the CF
having UUID A and half the nodes ended up with the new CF but UUID B.
UnknownColumnFamilyExceptions were thrown until the enqueued data exceeded
memory.  Eventually one half of the nodes crashed, with the other half
having a consistent view of the CF.  At this point I just dropped the
offending CF schema in the active cluster, then the downed nodes could be
re-added successfully.  We lost some data.  :(



On Sun, Mar 22, 2015 at 11:39 AM, Tim Olson kash...@gmail.com wrote:

 ​After upgrading a schema, I'm getting lots of
 UnknownColumnFamilyException in the logs.  However, all nodes have the
 same schema as reported by nodetool describecluster.   I queried the
 system tables for the given column family UUID, but it doesn't appear in
 any of the schemas on any of the nodes.  I restarted all clients, but that
 didn't help either.

 The cluster was running 2.1.2 but I recently upgraded to 2.1.3.

 Any ideas?  This is basically making our production cluster highly
 unresponsive.

 Tim



Re: cassandra triggers

2015-03-23 Thread Rahul Bhardwaj
Yes Asit you can share it with me, let c if we can implement with our
requirement.


Regards:
Rahul Bhardwaj

On Mon, Mar 23, 2015 at 1:43 PM, Asit KAUSHIK asitkaushikno...@gmail.com
wrote:

 Hi Rahul,

 i have created a trigger which inserts a default value into the table. But
 everyone are against using it. As its an external code which may be
 uncompatible in future releases. Its was a chnallenge as all the examples
 are of old 2.0.X veresion where RowMutable package is used which is
 discontinued in the later releases.

 If you still want the code i can give you . The application is same as on
 all the sites i used below for my reference. But again the code is for
 older release and would not work..

 http://noflex.org/learn-experiment-cassandra-trigger/


 On Mon, Mar 23, 2015 at 11:01 AM, Rahul Bhardwaj 
 rahul.bhard...@indiamart.com wrote:

 Hi All,


 I want to use triggers in cassandra. Is there any tutorial on creating
 triggers in cassandra .
 Also I am not good in java.

 Pl help !!

 Regards:
 Rahul Bhardwaj


 Follow IndiaMART.com http://www.indiamart.com for latest updates on
 this and more: https://plus.google.com/+indiamart
 https://www.facebook.com/IndiaMART https://twitter.com/IndiaMART
 Mobile Channel:
 https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641mt=8
 https://play.google.com/store/apps/details?id=com.indiamart.m
 http://m.indiamart.com/

 https://www.youtube.com/watch?v=DzORNbeSXN8list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1index=2
 Watch how IndiaMART Maximiser helped Mr. Khanna expand his business.
 kyunki Kaam Yahin Banta Hai
 https://www.youtube.com/watch?v=Q9fZ5ILY3w8feature=youtu.be!!!




-- 

Follow IndiaMART.com http://www.indiamart.com for latest updates on this 
and more: https://plus.google.com/+indiamart 
https://www.facebook.com/IndiaMART https://twitter.com/IndiaMART Mobile 
Channel: 
https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641mt=8
 
https://play.google.com/store/apps/details?id=com.indiamart.m 
http://m.indiamart.com/
https://www.youtube.com/watch?v=DzORNbeSXN8list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1index=2
Watch how IndiaMART Maximiser helped Mr. Khanna expand his business. kyunki 
Kaam 
Yahin Banta Hai 
https://www.youtube.com/watch?v=Q9fZ5ILY3w8feature=youtu.be!!!


Re: Really high read latency

2015-03-23 Thread Dave Galbraith
Duncan: I'm thinking it might be something like that. I'm also seeing just
a ton of garbage collection on the box, could it be pulling rows for all
100k attrs for a given row_time into memory since only row_time is the
partition key?

Jens: I'm not using EBS (although I used to until I read up on how useless
it is). I'm not sure what constitutes proper paging but my client has a
pretty small amount of available memory so I'm doing pages of size 5k using
the C++ Datastax driver.

Thanks for the replies!

-Dave

On Mon, Mar 23, 2015 at 2:00 AM, Jens Rantil jens.ran...@tink.se wrote:

 Also, two control questions:

- Are you using EBS for data storage? It might introduce additional
latencies.
- Are you doing proper paging when querying the keyspace?

 Cheers,
 Jens

 On Mon, Mar 23, 2015 at 5:56 AM, Dave Galbraith 
 david92galbra...@gmail.com wrote:

 Hi! So I've got a table like this:

 CREATE TABLE default.metrics (row_time int,attrs varchar,offset
 int,value double, PRIMARY KEY(row_time, attrs, offset)) WITH COMPACT
 STORAGE AND bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND
 comment='' AND dclocal_read_repair_chance=0 AND gc_grace_seconds=864000 AND
 index_interval=128 AND read_repair_chance=1 AND replicate_on_write='true'
 AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND
 speculative_retry='NONE' AND memtable_flush_period_in_ms=0 AND
 compaction={'class':'DateTieredCompactionStrategy','timestamp_resolution':'MILLISECONDS'}
 AND compression={'sstable_compression':'LZ4Compressor'};

 and I'm running Cassandra on an EC2 m3.2xlarge out in the cloud, with 4
 GB of heap space. So it's timeseries data that I'm doing so I increment
 row_time each day, attrs is additional identifying information about
 each series, and offset is the number of milliseconds into the day for
 each data point. So for the past 5 days, I've been inserting 3k
 points/second distributed across 100k distinct attrses. And now when I
 try to run queries on this data that look like

 SELECT * FROM default.metrics WHERE row_time = 5 AND attrs =
 'potatoes_and_jam'

 it takes an absurdly long time and sometimes just times out. I did
 nodetool cftsats default and here's what I get:

 Keyspace: default
 Read Count: 59
 Read Latency: 397.12523728813557 ms.
 Write Count: 155128
 Write Latency: 0.3675690719921613 ms.
 Pending Flushes: 0
 Table: metrics
 SSTable count: 26
 Space used (live): 35146349027
 Space used (total): 35146349027
 Space used by snapshots (total): 0
 SSTable Compression Ratio: 0.10386468749216264
 Memtable cell count: 141800
 Memtable data size: 31071290
 Memtable switch count: 41
 Local read count: 59
 Local read latency: 397.126 ms
 Local write count: 155128
 Local write latency: 0.368 ms
 Pending flushes: 0
 Bloom filter false positives: 0
 Bloom filter false ratio: 0.0
 Bloom filter space used: 2856
 Compacted partition minimum bytes: 104
 Compacted partition maximum bytes: 36904729268
 Compacted partition mean bytes: 986530969
 Average live cells per slice (last five minutes):
 501.66101694915255
 Maximum live cells per slice (last five minutes): 502.0
 Average tombstones per slice (last five minutes): 0.0
 Maximum tombstones per slice (last five minutes): 0.0

 Ouch! 400ms of read latency, orders of magnitude higher than it has any
 right to be. How could this have happened? Is there something fundamentally
 broken about my data model? Thanks!




 --
 Jens Rantil
 Backend engineer
 Tink AB

 Email: jens.ran...@tink.se
 Phone: +46 708 84 18 32
 Web: www.tink.se

 Facebook https://www.facebook.com/#!/tink.se Linkedin
 http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
  Twitter https://twitter.com/tink



Re: Really high read latency

2015-03-23 Thread Jens Rantil
Also, two control questions:

   - Are you using EBS for data storage? It might introduce additional
   latencies.
   - Are you doing proper paging when querying the keyspace?

Cheers,
Jens

On Mon, Mar 23, 2015 at 5:56 AM, Dave Galbraith david92galbra...@gmail.com
wrote:

 Hi! So I've got a table like this:

 CREATE TABLE default.metrics (row_time int,attrs varchar,offset
 int,value double, PRIMARY KEY(row_time, attrs, offset)) WITH COMPACT
 STORAGE AND bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND
 comment='' AND dclocal_read_repair_chance=0 AND gc_grace_seconds=864000 AND
 index_interval=128 AND read_repair_chance=1 AND replicate_on_write='true'
 AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND
 speculative_retry='NONE' AND memtable_flush_period_in_ms=0 AND
 compaction={'class':'DateTieredCompactionStrategy','timestamp_resolution':'MILLISECONDS'}
 AND compression={'sstable_compression':'LZ4Compressor'};

 and I'm running Cassandra on an EC2 m3.2xlarge out in the cloud, with 4 GB
 of heap space. So it's timeseries data that I'm doing so I increment
 row_time each day, attrs is additional identifying information about
 each series, and offset is the number of milliseconds into the day for
 each data point. So for the past 5 days, I've been inserting 3k
 points/second distributed across 100k distinct attrses. And now when I
 try to run queries on this data that look like

 SELECT * FROM default.metrics WHERE row_time = 5 AND attrs =
 'potatoes_and_jam'

 it takes an absurdly long time and sometimes just times out. I did
 nodetool cftsats default and here's what I get:

 Keyspace: default
 Read Count: 59
 Read Latency: 397.12523728813557 ms.
 Write Count: 155128
 Write Latency: 0.3675690719921613 ms.
 Pending Flushes: 0
 Table: metrics
 SSTable count: 26
 Space used (live): 35146349027
 Space used (total): 35146349027
 Space used by snapshots (total): 0
 SSTable Compression Ratio: 0.10386468749216264
 Memtable cell count: 141800
 Memtable data size: 31071290
 Memtable switch count: 41
 Local read count: 59
 Local read latency: 397.126 ms
 Local write count: 155128
 Local write latency: 0.368 ms
 Pending flushes: 0
 Bloom filter false positives: 0
 Bloom filter false ratio: 0.0
 Bloom filter space used: 2856
 Compacted partition minimum bytes: 104
 Compacted partition maximum bytes: 36904729268
 Compacted partition mean bytes: 986530969
 Average live cells per slice (last five minutes):
 501.66101694915255
 Maximum live cells per slice (last five minutes): 502.0
 Average tombstones per slice (last five minutes): 0.0
 Maximum tombstones per slice (last five minutes): 0.0

 Ouch! 400ms of read latency, orders of magnitude higher than it has any
 right to be. How could this have happened? Is there something fundamentally
 broken about my data model? Thanks!




-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook https://www.facebook.com/#!/tink.se Linkedin
http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
 Twitter https://twitter.com/tink


Re: Logging client ID for YCSB workloads on Cassandra?

2015-03-23 Thread Jatin Ganhotra
Sure. I updated the YCSB code to pass a client ID as input parameter and
then stored the clientID in the properties and used it in the DBWrapper
class for logging per operation
*YCSB/core/src/main/java/com/yahoo/ycsb/DBWrapper.java*

Please let me know if you need more information and I can share my code
samples, if that helps.

—
Jatin Ganhotra
Graduate Student, Computer Science
University of Illinois at Urbana Champaign
http://jatinganhotra.com
http://linkedin.com/in/jatinganhotra


On Fri, Mar 20, 2015 at 12:03 PM, Jan cne...@yahoo.com wrote:

 HI Jatin;

 besides enabling Tracing,   is there any other way to get the task done  ?
  (to log the client ID for every operation)
 Please share with the community the solution, so that we could
 collectively learn from your experience.

 cheers
 Jan/



   On Friday, February 20, 2015 12:48 PM, Jatin Ganhotra 
 jatin.ganho...@gmail.com wrote:


 Never mind, got it working.

 Thanks :)

 —
 Jatin Ganhotra
 Graduate Student, Computer Science
 University of Illinois at Urbana Champaign
 http://jatinganhotra.com
 http://linkedin.com/in/jatinganhotra


 On Wed, Feb 18, 2015 at 7:09 PM, Jatin Ganhotra jatin.ganho...@gmail.com
 wrote:

 Hi,

 I'd like to log the client ID for every operation performed by the YCSB on
 my Cassandra cluster.

 The purpose is to identify  analyze various other consistency measures
 other than eventual consistency.

 I wanted to know if people have done something similar in the past. Or am
 I missing something really basic here?

 Please let me know if you need more information. Thanks
 —
 Jatin Ganhotra







Fwd: [RELEASE] Kundera-2.16 (Added support for Cassandra's UDTs)

2015-03-23 Thread karthik prasad


On Tuesday, 17 March 2015 22:42:30 UTC+5:30, Chhavi Gangwal wrote:

 Hi All,
  
 We are happy to announce Kundera-2.16 release.
  
 Kundera is a JPA 2.1 compliant, polyglot object-datastore mapping library 
 for NoSQL datastores. The idea behind Kundera is to make working with NoSQL 
 databases drop-dead simple and fun. It currently supports Cassandra, HBase, 
 MongoDB, Redis, OracleNoSQL, Neo4j,ElasticSearch,CouchDB and relational 
 databases.

 Major Changes in 2.16 and 2.15.1:
 =
   1) Support added for Cassandra-2.1.x version.
   2) Support for Cassandra User Defined Types as embeddables.
   3) Aggregation support available with elastic search is also enabled in 
 Kundera 
   4) Hbase data remodeling with support for HBase-1.0 version.

 ** *Support for aggregate functions is  also extended for other 
 Kundera clients' using Elastic search as indexing store.   

  * Support for Hbase version 1.0 with the revised data model is 
 available with Kundera's  kundera-hbase-v2 dependency. Further details on 
 this will be added to wiki soon.
 
  
 Github Bug Fixes :
 =
   https://github.com/impetus-opensource/Kundera/issues/716
   https://github.com/impetus-opensource/Kundera/issues/714
   https://github.com/impetus-opensource/Kundera/issues/659
   https://github.com/impetus-opensource/Kundera/issues/641
   https://github.com/impetus-opensource/Kundera/issues/708
   https://github.com/impetus-opensource/Kundera/issues/707
   https://github.com/impetus-opensource/Kundera/issues/693
   https://github.com/impetus-opensource/Kundera/issues/672
  
  
 How to Download:
 =
 To download, use or contribute to Kundera, visit:
 http://github.com/impetus-opensource/Kundera
  
 Latest release of Kundera's tag is 2.16 whose maven libraries are now 
 available at:
  https://oss.sonatype.org/content/repositories/releases/com/impetus.
  
 2.16 release of Kundera is compatible with Cassandra2.x which includes JDK 
 1.7  as one of its pre-requisites.
  
 The older versions of Cassandra(1.x) over JPA2.0 can be used with archived 
 versions of Kundera and its current release's branch - Kundera-2.12-1.x 
 hosted at :
 https://github.com/impetus-opensource/Kundera/releases/tag/kundera-2.12-1.x 
 , whose maven libraries are also available at :  
 https://oss.sonatype.org/content/repositories/releases/com/impetus
  
 Sample code and examples for using Kundera can be found here:
 https://github.com/impetus-opensource/Kundera/tree/trunk/src/kundera-tests
  
 Troubleshooting :
 ===
  In case you are using 2.16 version Kundera with Cassandra make sure you 
 have JDK 1.7 installed. Also, if you wish to use UDT support with Cassandra 
 please enable CLQ3 before performing any operations. 
  
 Please share you feedback with us by filling a simple survey:
 http://www.surveymonkey.com/s/BMB9PWG
  
 Thank you all for your contributions and using Kundera!
  
 Regards,
 Kundera Team
 Follow us on twitter https://twitter.com/kundera_impetus,linkedin 
 http://in.linkedin.com/pub/kundera-impetus/b4/870/153



Cassandra time series + Spark

2015-03-23 Thread Rumph, Frens Jan
Hi,

I'm working on a system which has to deal with time series data. I've been
happy using Cassandra for time series and Spark looks promising as a
computational platform.

I consider chunking time series in Cassandra necessary, e.g. by 3 weeks as
kairosdb does it. This allows an 8 byte chunk start timestamp with 4 byte
offsets for the individual measurements. And it keeps the data below 2x10^9
even at 1000 Hz.

This schema works quite okay when dealing with one time series at a time.
Because the data is partitioned by time series id and chunk of time (e.g.
the three weeks mentioned above), it requires a little client side logic to
retrieve the partitions and glue them together, but this is quite okay.

However, when working with many / all of the time series in a table at
once, e.g. in Spark, the story changes dramatically. Say I'd want to
compute something simple as a moving average, I have to deal with data all
over the place. I can't currently think of anything but performing
aggregateByKey causing a shuffle every time.

Anyone have experience with combining time series chunking and computation
on all / many time series at once? Any advice?

Cheers,
Frens Jan


Re: cassandra triggers

2015-03-23 Thread Asit KAUSHIK
Hi Rahul,

i have created a trigger which inserts a default value into the table. But
everyone are against using it. As its an external code which may be
uncompatible in future releases. Its was a chnallenge as all the examples
are of old 2.0.X veresion where RowMutable package is used which is
discontinued in the later releases.

If you still want the code i can give you . The application is same as on
all the sites i used below for my reference. But again the code is for
older release and would not work..

http://noflex.org/learn-experiment-cassandra-trigger/


On Mon, Mar 23, 2015 at 11:01 AM, Rahul Bhardwaj 
rahul.bhard...@indiamart.com wrote:

 Hi All,


 I want to use triggers in cassandra. Is there any tutorial on creating
 triggers in cassandra .
 Also I am not good in java.

 Pl help !!

 Regards:
 Rahul Bhardwaj


 Follow IndiaMART.com http://www.indiamart.com for latest updates on
 this and more: https://plus.google.com/+indiamart
 https://www.facebook.com/IndiaMART https://twitter.com/IndiaMART
 Mobile Channel:
 https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641mt=8
 https://play.google.com/store/apps/details?id=com.indiamart.m
 http://m.indiamart.com/

 https://www.youtube.com/watch?v=DzORNbeSXN8list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1index=2
 Watch how IndiaMART Maximiser helped Mr. Khanna expand his business.
 kyunki Kaam Yahin Banta Hai
 https://www.youtube.com/watch?v=Q9fZ5ILY3w8feature=youtu.be!!!


Re: Really high read latency

2015-03-23 Thread Chris Lohfink
  Compacted partition maximum bytes: 36904729268

thats huge... 36gb rows are gonna cause a lot of problems, even when you
specify a precise cell under this it still is going to have an enormous
column index to deserialize on every read for the partition.  As mentioned
above, you should include your attribute name in the partition key ((row_time,
attrs))
 to spread this out... Id call that critical

Chris

On Mon, Mar 23, 2015 at 4:13 PM, Dave Galbraith david92galbra...@gmail.com
wrote:

 I haven't deleted anything. Here's output from a traced cqlsh query (I
 tried to make the spaces line up, hope it's legible):

 Execute CQL3
 query
 | 2015-03-23 21:04:37.422000 | 172.31.32.211 |  0
 Parsing select * from default.metrics where row_time = 16511 and attrs =
 '[redacted]' limit 100; [SharedPool-Worker-2] | 2015-03-23 21:04:37.423000
 | 172.31.32.211 | 93
 Preparing statement
 [SharedPool-Worker-2]
 | 2015-03-23 21:04:37.423000 | 172.31.32.211 |696
 Executing single-partition query on metrics [SharedPool-Worker-1]

   | 2015-03-23
 21:04:37.425000 | 172.31.32.211 |   2807
 Acquiring sstable references [SharedPool-Worker-1]

 | 2015-03-23 21:04:37.425000 |
 172.31.32.211 |   2993
 Merging memtable tombstones [SharedPool-Worker-1]

 | 2015-03-23 21:04:37.426000 |
 172.31.32.211 |   3049
 Partition index with 484338 entries found for sstable 15966
 [SharedPool-Worker-1]
 | 2015-03-23 21:04:38.625000 | 172.31.32.211
 | 202304
 Seeking to partition indexed section in data file
 [SharedPool-Worker-1]
 | 2015-03-23 21:04:38.625000 | 172.31.32.211 | 202354
 Bloom filter allows skipping sstable 5613 [SharedPool-Worker-1]

  | 2015-03-23 21:04:38.625000 | 172.31.32.211 |
 202445
 Bloom filter allows skipping sstable 5582 [SharedPool-Worker-1]

  | 2015-03-23 21:04:38.625000 | 172.31.32.211 |
 202478
 Bloom filter allows skipping sstable 5611 [SharedPool-Worker-1]

  | 2015-03-23 21:04:38.625000 | 172.31.32.211 |
 202508
 Bloom filter allows skipping sstable 5610
 [SharedPool-Worker-1]
 | 2015-03-23 21:04:38.625000 | 172.31.32.211 | 202539
 Bloom filter allows skipping sstable 5549
 [SharedPool-Worker-1]
 | 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202678
 Bloom filter allows skipping sstable 5544 [SharedPool-Worker-1]

  | 2015-03-23 21:04:38.625001 | 172.31.32.211 |
 202720
 Bloom filter allows skipping sstable 5237
 [SharedPool-Worker-1]
 | 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202752
 Bloom filter allows skipping sstable 2516
 [SharedPool-Worker-1]
 | 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202782
 Bloom filter allows skipping sstable 2632 [SharedPool-Worker-1]

 | 2015-03-23 21:04:38.625001 | 172.31.32.211 |
 202812
 Bloom filter allows skipping sstable 3015 [SharedPool-Worker-1]

 | 2015-03-23 21:04:38.625001 | 172.31.32.211 |
 202852
 Skipped 0/11 non-slice-intersecting sstables, included 0 due to tombstones
 [SharedPool-Worker-1]   | 2015-03-23
 21:04:38.625001 | 172.31.32.211 | 202882
 Merging data from memtables and 1 sstables [SharedPool-Worker-1]

 | 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202902
 Read 101 live and 0 tombstoned cells
 [SharedPool-Worker-1]
 | 2015-03-23 21:04:38.626000 | 172.31.32.211 | 203752
 Request complete

  | 2015-03-23
 21:04:38.628253 | 172.31.32.211 | 206253

 On Mon, Mar 23, 2015 at 11:53 AM, Eric Stevens migh...@gmail.com wrote:

 Enable tracing in cqlsh and see how many sstables are being lifted to
 satisfy the query (are you repeatedly writing to the same partition
 [row_time]) over time?).

 Also watch for whether you're hitting a lot of tombstones (are you
 deleting lots of values in the same partition over time?).

 On Mon, Mar 23, 2015 at 4:01 AM, Dave Galbraith 
 david92galbra...@gmail.com wrote:

 Duncan: I'm thinking it might be something like that. I'm also seeing
 just a ton of garbage collection on the box, could it be pulling rows for
 all 100k attrs for a given row_time into memory since only row_time is the
 partition key?

 Jens: I'm not using EBS (although I used to until I read up on how
 useless it is). I'm not sure what constitutes proper paging but my client
 has a pretty small amount of available memory so I'm doing pages of size 5k
 using the C++ Datastax driver.

 Thanks for the replies!

 -Dave

 On Mon, Mar 23, 2015 at 2:00 AM, Jens Rantil jens.ran...@tink.se
 wrote:

 Also, two control questions:

- Are you using EBS for data storage? It might introduce additional
latencies.
- Are you doing proper paging when querying the 

Re: Really high read latency

2015-03-23 Thread Dave Galbraith
I haven't deleted anything. Here's output from a traced cqlsh query (I
tried to make the spaces line up, hope it's legible):

Execute CQL3
query
| 2015-03-23 21:04:37.422000 | 172.31.32.211 |  0
Parsing select * from default.metrics where row_time = 16511 and attrs =
'[redacted]' limit 100; [SharedPool-Worker-2] | 2015-03-23 21:04:37.423000
| 172.31.32.211 | 93
Preparing statement
[SharedPool-Worker-2]
| 2015-03-23 21:04:37.423000 | 172.31.32.211 |696
Executing single-partition query on metrics [SharedPool-Worker-1]

  | 2015-03-23
21:04:37.425000 | 172.31.32.211 |   2807
Acquiring sstable references [SharedPool-Worker-1]

| 2015-03-23 21:04:37.425000 |
172.31.32.211 |   2993
Merging memtable tombstones [SharedPool-Worker-1]

| 2015-03-23 21:04:37.426000 |
172.31.32.211 |   3049
Partition index with 484338 entries found for sstable 15966
[SharedPool-Worker-1]
| 2015-03-23 21:04:38.625000 | 172.31.32.211
| 202304
Seeking to partition indexed section in data file
[SharedPool-Worker-1]
| 2015-03-23 21:04:38.625000 | 172.31.32.211 | 202354
Bloom filter allows skipping sstable 5613 [SharedPool-Worker-1]

 | 2015-03-23 21:04:38.625000 | 172.31.32.211 |
202445
Bloom filter allows skipping sstable 5582 [SharedPool-Worker-1]

 | 2015-03-23 21:04:38.625000 | 172.31.32.211 |
202478
Bloom filter allows skipping sstable 5611 [SharedPool-Worker-1]

 | 2015-03-23 21:04:38.625000 | 172.31.32.211 |
202508
Bloom filter allows skipping sstable 5610
[SharedPool-Worker-1]
| 2015-03-23 21:04:38.625000 | 172.31.32.211 | 202539
Bloom filter allows skipping sstable 5549
[SharedPool-Worker-1]
| 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202678
Bloom filter allows skipping sstable 5544 [SharedPool-Worker-1]

 | 2015-03-23 21:04:38.625001 | 172.31.32.211 |
202720
Bloom filter allows skipping sstable 5237
[SharedPool-Worker-1]
| 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202752
Bloom filter allows skipping sstable 2516
[SharedPool-Worker-1]
| 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202782
Bloom filter allows skipping sstable 2632 [SharedPool-Worker-1]

| 2015-03-23 21:04:38.625001 | 172.31.32.211 |
202812
Bloom filter allows skipping sstable 3015 [SharedPool-Worker-1]

| 2015-03-23 21:04:38.625001 | 172.31.32.211 |
202852
Skipped 0/11 non-slice-intersecting sstables, included 0 due to tombstones
[SharedPool-Worker-1]   | 2015-03-23
21:04:38.625001 | 172.31.32.211 | 202882
Merging data from memtables and 1 sstables [SharedPool-Worker-1]

| 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202902
Read 101 live and 0 tombstoned cells
[SharedPool-Worker-1]
| 2015-03-23 21:04:38.626000 | 172.31.32.211 | 203752
Request complete

 | 2015-03-23
21:04:38.628253 | 172.31.32.211 | 206253

On Mon, Mar 23, 2015 at 11:53 AM, Eric Stevens migh...@gmail.com wrote:

 Enable tracing in cqlsh and see how many sstables are being lifted to
 satisfy the query (are you repeatedly writing to the same partition
 [row_time]) over time?).

 Also watch for whether you're hitting a lot of tombstones (are you
 deleting lots of values in the same partition over time?).

 On Mon, Mar 23, 2015 at 4:01 AM, Dave Galbraith 
 david92galbra...@gmail.com wrote:

 Duncan: I'm thinking it might be something like that. I'm also seeing
 just a ton of garbage collection on the box, could it be pulling rows for
 all 100k attrs for a given row_time into memory since only row_time is the
 partition key?

 Jens: I'm not using EBS (although I used to until I read up on how
 useless it is). I'm not sure what constitutes proper paging but my client
 has a pretty small amount of available memory so I'm doing pages of size 5k
 using the C++ Datastax driver.

 Thanks for the replies!

 -Dave

 On Mon, Mar 23, 2015 at 2:00 AM, Jens Rantil jens.ran...@tink.se wrote:

 Also, two control questions:

- Are you using EBS for data storage? It might introduce additional
latencies.
- Are you doing proper paging when querying the keyspace?

 Cheers,
 Jens

 On Mon, Mar 23, 2015 at 5:56 AM, Dave Galbraith 
 david92galbra...@gmail.com wrote:

 Hi! So I've got a table like this:

 CREATE TABLE default.metrics (row_time int,attrs varchar,offset
 int,value double, PRIMARY KEY(row_time, attrs, offset)) WITH COMPACT
 STORAGE AND bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND
 comment='' AND dclocal_read_repair_chance=0 AND gc_grace_seconds=864000 AND
 index_interval=128 AND read_repair_chance=1 AND replicate_on_write='true'
 AND populate_io_cache_on_flush='false' AND 

Re: 2d or multi dimension range query in cassandra CQL

2015-03-23 Thread Mehak Mehta
Hi,

I checked Startio Cassandra but couldn't get any good documentation for the
same.
Can you give me some pointers on how to use it.
Do I have to build it from the source or I can use it directly with jar
files as in case of Stargate.
Since I was looking for solution which I don't need a full build and can be
used with existing tar of cassandra because I have some restrictions on
installing stuff on my server.

Thanks,
Mehak

On Mon, Mar 23, 2015 at 8:17 AM, Asit KAUSHIK asitkaushikno...@gmail.com
wrote:

 i am using Startio Cassandra it way better than stargate as it works on
 the latest release of Cassandra and is better on my performance.

 we are using it for fulltext search use case

 Regards
 Asit

 On Sun, Mar 22, 2015 at 12:14 PM, Mehak Mehta meme...@cs.stonybrook.edu
 wrote:

 Hi,

 On the basis of some suggestions, I tried using tuplejump for 
 multidimensional
 queries. Since other mostly needed root permissions (for building ) which I
 don't have on my cluster account.

 I found a major problem in tuplejump (stargate-core). When I am using it
 with a list type field in my table. It stops working.
 For e.g.

 create table person (
 id int primary key,
 isActive boolean,
 age int,
 eyeColor varchar,
 name text,
 gender varchar,
 company varchar,
 email varchar,
 phone varchar,
 address text,
 points listdouble,
 stargate text
 );

 with indexing as:
 CREATE CUSTOM INDEX person_idx ON PERSON(stargate) USING some
 'com.tuplejump.stargate.RowIndex' WITH options =
 {
 'sg_options':'{
 fields:{
 eyeColor:{},
 age:{},
 phone:{}
 }
 }'
 };

 If I insert data in the table along with points list. The following query
 won't give any results (0 rows):

 SELECT * FROM RESULTS1 WHERE stargate ='{
 filter: {
 type: range,
 field: x,
 lower: 0
 }
 }';

 I tried removing points listdouble from the table and it works i.e.
 same query will return results.
 Can somebody help me with this problem as I couldn't find much support
 from Stargate.

 Please note that I am using Cassandra 2.0.9 compatible with Stargate-core
 as given in link (
 http://stargate-core.readthedocs.org/en/latest/quickstart.html).

 Thanks,
 Mehak


 On Wed, Mar 18, 2015 at 5:45 AM, Andres de la Peña adelap...@stratio.com
  wrote:

 Hi,

 With Stratio Cassandra you can create Lucene based indexes for
 multidimensional queries this way:

 ALTER TABLE images.results1 ADD lucene text ;

 CREATE CUSTOM INDEX lucene_idx ON images.results1 (lucene)
 USING 'com.stratio.cassandra.index.RowIndex'
 WITH OPTIONS = {
  'refresh_seconds':'1',
  'schema':'{
   fields:{
   image_caseid:{type:string},
 x:{type:double},
 y:{type:double} } } '};

 Then you can perform the query using the dummy column:

 SELECT * FROM images.results1 WHERE lucene='{ filter:{type:boolean,
 must:[
 {field:image_caseid, type:match, value:mehak},
 {field:x, type:range, lower:100},
 {field:y, type:range, lower:100}
 ]}}';

 However, you can take advantage of partition key to route the query only
 to the nodes owning the data:

 SELECT * FROM images.results1 WHERE image_caseid='mehak' AND lucene='{
 filter:{type:boolean, must:[
 {field:x, type:range, lower:100},
 {field:y, type:range, lower:100}
 ]}}';

 Or, even better:

 SELECT * FROM images.results1 WHERE image_caseid='mehak' AND x100 AND
 lucene='{ filter:{field:y, type:range, lower:100}}';

 Additionally, if your data are geospatial (latitude and longitude), soon
 you will can use the incoming spatial features.



 2015-03-17 23:01 GMT+01:00 Mehak Mehta meme...@cs.stonybrook.edu:

 Sorry I gave you wrong table definition for query. Here a composite key
 of image_caseid, x and uuid which is unique. I have used x in clustering
 columns to query it. And used secondary index on y column.

 1. Example
 *cqlsh:images CREATE TABLE images.results1 (uuid uuid,
 analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y
 double, submit_date timestamp, points listdouble,  PRIMARY KEY
 ((image_caseid),x,uuid));*
 *cqlsh:images create index results1_y on results1(y);*

 In the below query you can see I have image_caseid as primary key which
 is filtered. Even then it is giving error that *No indexed columns
 present*

 *cqlsh:images select * from results1 where image_caseid='mehak' and x
  100 and y100 order by image_caseid asc;*
 *code=2200 [Invalid query] message=No indexed columns present in
 by-columns clause with Equal operator*

 2. Example
 I also tried including both x and y columns as composite key even then
 query gives following error:

 *cqlsh:images CREATE TABLE images.results1 (uuid uuid,
 analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y
 double, submit_date timestamp, points listdouble,  PRIMARY KEY
 ((image_caseid),x,y,uuid));*

 *cqlsh:images select * from results1 where 

Re: cassandra triggers

2015-03-23 Thread Robert Coli
On Sun, Mar 22, 2015 at 10:31 PM, Rahul Bhardwaj 
rahul.bhard...@indiamart.com wrote:

 I want to use triggers in cassandra. Is there any tutorial on creating
 triggers in cassandra .


For the record, it is my understanding that you almost certainly should not
use the current Cassandra triggers in production for any real work.

=Rob


Re: write timeout

2015-03-23 Thread Robert Coli
On Mon, Mar 23, 2015 at 7:27 AM, Brian Tarbox briantar...@gmail.com wrote:

 My group is seeing the same thing and also can not figure out why its
 happening.

 On Mon, Mar 23, 2015 at 8:36 AM, Anishek Agarwal anis...@gmail.com
 wrote:

 Forgot to mention I am using Cassandra 2.0.13


This seems like a rather significant bug in the most recent stable version.
In this case, I would tend to file a JIRA first and then ask the mailing
list second.

Could one or both of you file steps-to-reproduce with a JIRA at
http://issues.apache.org?

=Rob


Re: Really high read latency

2015-03-23 Thread Eric Stevens
Enable tracing in cqlsh and see how many sstables are being lifted to
satisfy the query (are you repeatedly writing to the same partition
[row_time]) over time?).

Also watch for whether you're hitting a lot of tombstones (are you deleting
lots of values in the same partition over time?).

On Mon, Mar 23, 2015 at 4:01 AM, Dave Galbraith david92galbra...@gmail.com
wrote:

 Duncan: I'm thinking it might be something like that. I'm also seeing just
 a ton of garbage collection on the box, could it be pulling rows for all
 100k attrs for a given row_time into memory since only row_time is the
 partition key?

 Jens: I'm not using EBS (although I used to until I read up on how useless
 it is). I'm not sure what constitutes proper paging but my client has a
 pretty small amount of available memory so I'm doing pages of size 5k using
 the C++ Datastax driver.

 Thanks for the replies!

 -Dave

 On Mon, Mar 23, 2015 at 2:00 AM, Jens Rantil jens.ran...@tink.se wrote:

 Also, two control questions:

- Are you using EBS for data storage? It might introduce additional
latencies.
- Are you doing proper paging when querying the keyspace?

 Cheers,
 Jens

 On Mon, Mar 23, 2015 at 5:56 AM, Dave Galbraith 
 david92galbra...@gmail.com wrote:

 Hi! So I've got a table like this:

 CREATE TABLE default.metrics (row_time int,attrs varchar,offset
 int,value double, PRIMARY KEY(row_time, attrs, offset)) WITH COMPACT
 STORAGE AND bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND
 comment='' AND dclocal_read_repair_chance=0 AND gc_grace_seconds=864000 AND
 index_interval=128 AND read_repair_chance=1 AND replicate_on_write='true'
 AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND
 speculative_retry='NONE' AND memtable_flush_period_in_ms=0 AND
 compaction={'class':'DateTieredCompactionStrategy','timestamp_resolution':'MILLISECONDS'}
 AND compression={'sstable_compression':'LZ4Compressor'};

 and I'm running Cassandra on an EC2 m3.2xlarge out in the cloud, with 4
 GB of heap space. So it's timeseries data that I'm doing so I increment
 row_time each day, attrs is additional identifying information about
 each series, and offset is the number of milliseconds into the day for
 each data point. So for the past 5 days, I've been inserting 3k
 points/second distributed across 100k distinct attrses. And now when I
 try to run queries on this data that look like

 SELECT * FROM default.metrics WHERE row_time = 5 AND attrs =
 'potatoes_and_jam'

 it takes an absurdly long time and sometimes just times out. I did
 nodetool cftsats default and here's what I get:

 Keyspace: default
 Read Count: 59
 Read Latency: 397.12523728813557 ms.
 Write Count: 155128
 Write Latency: 0.3675690719921613 ms.
 Pending Flushes: 0
 Table: metrics
 SSTable count: 26
 Space used (live): 35146349027
 Space used (total): 35146349027
 Space used by snapshots (total): 0
 SSTable Compression Ratio: 0.10386468749216264
 Memtable cell count: 141800
 Memtable data size: 31071290
 Memtable switch count: 41
 Local read count: 59
 Local read latency: 397.126 ms
 Local write count: 155128
 Local write latency: 0.368 ms
 Pending flushes: 0
 Bloom filter false positives: 0
 Bloom filter false ratio: 0.0
 Bloom filter space used: 2856
 Compacted partition minimum bytes: 104
 Compacted partition maximum bytes: 36904729268
 Compacted partition mean bytes: 986530969
 Average live cells per slice (last five minutes):
 501.66101694915255
 Maximum live cells per slice (last five minutes): 502.0
 Average tombstones per slice (last five minutes): 0.0
 Maximum tombstones per slice (last five minutes): 0.0

 Ouch! 400ms of read latency, orders of magnitude higher than it has any
 right to be. How could this have happened? Is there something fundamentally
 broken about my data model? Thanks!




 --
 Jens Rantil
 Backend engineer
 Tink AB

 Email: jens.ran...@tink.se
 Phone: +46 708 84 18 32
 Web: www.tink.se

 Facebook https://www.facebook.com/#!/tink.se Linkedin
 http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
  Twitter https://twitter.com/tink





Re: 2d or multi dimension range query in cassandra CQL

2015-03-23 Thread Andres de la Peña
Hi,

You can download Stratio Cassandra binaries from
https://s3.amazonaws.com/stratioorg/cassandra/stratio-cassandra-2.1.3.1-bin.tar.gz

You can get info about how to build and getting started at its README file
https://github.com/Stratio/stratio-cassandra/blob/master/README.md. More
detailed info can be found at
https://github.com/Stratio/stratio-cassandra/blob/master/doc/stratio/extended-search-in-cassandra.md
.

Regards,

2015-03-23 18:07 GMT+01:00 Mehak Mehta meme...@cs.stonybrook.edu:

 Hi,

 I checked Startio Cassandra but couldn't get any good documentation for
 the same.
 Can you give me some pointers on how to use it.
 Do I have to build it from the source or I can use it directly with jar
 files as in case of Stargate.
 Since I was looking for solution which I don't need a full build and can
 be used with existing tar of cassandra because I have some restrictions on
 installing stuff on my server.

 Thanks,
 Mehak

 On Mon, Mar 23, 2015 at 8:17 AM, Asit KAUSHIK asitkaushikno...@gmail.com
 wrote:

 i am using Startio Cassandra it way better than stargate as it works on
 the latest release of Cassandra and is better on my performance.

 we are using it for fulltext search use case

 Regards
 Asit

 On Sun, Mar 22, 2015 at 12:14 PM, Mehak Mehta meme...@cs.stonybrook.edu
 wrote:

 Hi,

 On the basis of some suggestions, I tried using tuplejump for 
 multidimensional
 queries. Since other mostly needed root permissions (for building ) which I
 don't have on my cluster account.

 I found a major problem in tuplejump (stargate-core). When I am using it
 with a list type field in my table. It stops working.
 For e.g.

 create table person (
 id int primary key,
 isActive boolean,
 age int,
 eyeColor varchar,
 name text,
 gender varchar,
 company varchar,
 email varchar,
 phone varchar,
 address text,
 points listdouble,
 stargate text
 );

 with indexing as:
 CREATE CUSTOM INDEX person_idx ON PERSON(stargate) USING some
 'com.tuplejump.stargate.RowIndex' WITH options =
 {
 'sg_options':'{
 fields:{
 eyeColor:{},
 age:{},
 phone:{}
 }
 }'
 };

 If I insert data in the table along with points list. The following
 query won't give any results (0 rows):

 SELECT * FROM RESULTS1 WHERE stargate ='{
 filter: {
 type: range,
 field: x,
 lower: 0
 }
 }';

 I tried removing points listdouble from the table and it works i.e.
 same query will return results.
 Can somebody help me with this problem as I couldn't find much support
 from Stargate.

 Please note that I am using Cassandra 2.0.9 compatible with
 Stargate-core as given in link (
 http://stargate-core.readthedocs.org/en/latest/quickstart.html).

 Thanks,
 Mehak


 On Wed, Mar 18, 2015 at 5:45 AM, Andres de la Peña 
 adelap...@stratio.com wrote:

 Hi,

 With Stratio Cassandra you can create Lucene based indexes for
 multidimensional queries this way:

 ALTER TABLE images.results1 ADD lucene text ;

 CREATE CUSTOM INDEX lucene_idx ON images.results1 (lucene)
 USING 'com.stratio.cassandra.index.RowIndex'
 WITH OPTIONS = {
  'refresh_seconds':'1',
  'schema':'{
   fields:{
   image_caseid:{type:string},
 x:{type:double},
 y:{type:double} } } '};

 Then you can perform the query using the dummy column:

 SELECT * FROM images.results1 WHERE lucene='{ filter:{type:boolean,
 must:[
 {field:image_caseid, type:match, value:mehak},
 {field:x, type:range, lower:100},
 {field:y, type:range, lower:100}
 ]}}';

 However, you can take advantage of partition key to route the query
 only to the nodes owning the data:

 SELECT * FROM images.results1 WHERE image_caseid='mehak' AND lucene='{
 filter:{type:boolean, must:[
 {field:x, type:range, lower:100},
 {field:y, type:range, lower:100}
 ]}}';

 Or, even better:

 SELECT * FROM images.results1 WHERE image_caseid='mehak' AND x100 AND
 lucene='{ filter:{field:y, type:range, lower:100}}';

 Additionally, if your data are geospatial (latitude and longitude),
 soon you will can use the incoming spatial features.



 2015-03-17 23:01 GMT+01:00 Mehak Mehta meme...@cs.stonybrook.edu:

 Sorry I gave you wrong table definition for query. Here a composite
 key of image_caseid, x and uuid which is unique. I have used x in
 clustering columns to query it. And used secondary index on y column.

 1. Example
 *cqlsh:images CREATE TABLE images.results1 (uuid uuid,
 analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y
 double, submit_date timestamp, points listdouble,  PRIMARY KEY
 ((image_caseid),x,uuid));*
 *cqlsh:images create index results1_y on results1(y);*

 In the below query you can see I have image_caseid as primary key
 which is filtered. Even then it is giving error that *No indexed
 columns present*

 *cqlsh:images select * from results1 where image_caseid='mehak' and x
  100 and y100 order by image_caseid asc;*

Re: Really high read latency

2015-03-23 Thread Ben Bromhead
nodetool cfhistograms is also very helpful in diagnosing these kinds of
data modelling issues.

On 23 March 2015 at 14:43, Chris Lohfink clohfin...@gmail.com wrote:


   Compacted partition maximum bytes: 36904729268

 thats huge... 36gb rows are gonna cause a lot of problems, even when you
 specify a precise cell under this it still is going to have an enormous
 column index to deserialize on every read for the partition.  As mentioned
 above, you should include your attribute name in the partition key ((row_time,
 attrs))
  to spread this out... Id call that critical

 Chris

 On Mon, Mar 23, 2015 at 4:13 PM, Dave Galbraith 
 david92galbra...@gmail.com wrote:

 I haven't deleted anything. Here's output from a traced cqlsh query (I
 tried to make the spaces line up, hope it's legible):

 Execute CQL3
 query
 | 2015-03-23 21:04:37.422000 | 172.31.32.211 |  0
 Parsing select * from default.metrics where row_time = 16511 and attrs =
 '[redacted]' limit 100; [SharedPool-Worker-2] | 2015-03-23 21:04:37.423000
 | 172.31.32.211 | 93
 Preparing statement
 [SharedPool-Worker-2]
 | 2015-03-23 21:04:37.423000 | 172.31.32.211 |696
 Executing single-partition query on metrics [SharedPool-Worker-1]

   | 2015-03-23
 21:04:37.425000 | 172.31.32.211 |   2807
 Acquiring sstable references [SharedPool-Worker-1]

 | 2015-03-23 21:04:37.425000 |
 172.31.32.211 |   2993
 Merging memtable tombstones [SharedPool-Worker-1]

 | 2015-03-23 21:04:37.426000 |
 172.31.32.211 |   3049
 Partition index with 484338 entries found for sstable 15966
 [SharedPool-Worker-1]
 | 2015-03-23 21:04:38.625000 | 172.31.32.211
 | 202304
 Seeking to partition indexed section in data file
 [SharedPool-Worker-1]
 | 2015-03-23 21:04:38.625000 | 172.31.32.211 | 202354
 Bloom filter allows skipping sstable 5613 [SharedPool-Worker-1]

  | 2015-03-23 21:04:38.625000 | 172.31.32.211 |
 202445
 Bloom filter allows skipping sstable 5582 [SharedPool-Worker-1]

  | 2015-03-23 21:04:38.625000 | 172.31.32.211 |
 202478
 Bloom filter allows skipping sstable 5611 [SharedPool-Worker-1]

  | 2015-03-23 21:04:38.625000 | 172.31.32.211 |
 202508
 Bloom filter allows skipping sstable 5610
 [SharedPool-Worker-1]
 | 2015-03-23 21:04:38.625000 | 172.31.32.211 | 202539
 Bloom filter allows skipping sstable 5549
 [SharedPool-Worker-1]
 | 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202678
 Bloom filter allows skipping sstable 5544 [SharedPool-Worker-1]

  | 2015-03-23 21:04:38.625001 | 172.31.32.211 |
 202720
 Bloom filter allows skipping sstable 5237
 [SharedPool-Worker-1]
 | 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202752
 Bloom filter allows skipping sstable 2516
 [SharedPool-Worker-1]
 | 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202782
 Bloom filter allows skipping sstable 2632 [SharedPool-Worker-1]

 | 2015-03-23 21:04:38.625001 | 172.31.32.211 |
 202812
 Bloom filter allows skipping sstable 3015 [SharedPool-Worker-1]

 | 2015-03-23 21:04:38.625001 | 172.31.32.211 |
 202852
 Skipped 0/11 non-slice-intersecting sstables, included 0 due to
 tombstones [SharedPool-Worker-1]   | 2015-03-23
 21:04:38.625001 | 172.31.32.211 | 202882
 Merging data from memtables and 1 sstables [SharedPool-Worker-1]

 | 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202902
 Read 101 live and 0 tombstoned cells
 [SharedPool-Worker-1]
 | 2015-03-23 21:04:38.626000 | 172.31.32.211 | 203752
 Request complete

  | 2015-03-23
 21:04:38.628253 | 172.31.32.211 | 206253

 On Mon, Mar 23, 2015 at 11:53 AM, Eric Stevens migh...@gmail.com wrote:

 Enable tracing in cqlsh and see how many sstables are being lifted to
 satisfy the query (are you repeatedly writing to the same partition
 [row_time]) over time?).

 Also watch for whether you're hitting a lot of tombstones (are you
 deleting lots of values in the same partition over time?).

 On Mon, Mar 23, 2015 at 4:01 AM, Dave Galbraith 
 david92galbra...@gmail.com wrote:

 Duncan: I'm thinking it might be something like that. I'm also seeing
 just a ton of garbage collection on the box, could it be pulling rows for
 all 100k attrs for a given row_time into memory since only row_time is the
 partition key?

 Jens: I'm not using EBS (although I used to until I read up on how
 useless it is). I'm not sure what constitutes proper paging but my client
 has a pretty small amount of available memory so I'm doing pages of size 5k
 using the C++ Datastax driver.

 Thanks for the replies!

 -Dave

 On Mon, Mar 23, 2015 at 2:00 AM, Jens Rantil jens.ran...@tink.se
 

Deleted columns reappear after repair

2015-03-23 Thread Roman Tkachenko
Hey guys,

We're having a very strange issue: deleted columns get resurrected when
repair is run on a node.

Info about the setup. Cassandra 2.0.13, multi datacenter with 12 nodes in
one datacenter and 6 nodes in another one. Schema:

cqlsh describe keyspace blackbook;

CREATE KEYSPACE blackbook WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'IAD': '3',
  'ORD': '3'
};

USE blackbook;

CREATE TABLE bounces (
  domainid text,
  address text,
  message text,
  timestamp bigint,
  PRIMARY KEY (domainid, address)
) WITH
  bloom_filter_fp_chance=0.10 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.10 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.00 AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='99.0PERCENTILE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

We're using wide rows for the bounces table that can store hundreds of
thousands of addresses for each domainid (in practice it's much less
usually, but some rows may contain up to several million columns).

All queries are done using LOCAL_QUORUM consistency. Sometimes bounces are
deleted from the table using the following CQL3 statement:

delete from bounces where domainid = 'domain.com' and address = '
al...@example.com';

But the thing is, after repair is run on any node that owns domain.com
key, the column gets resurrected on all nodes as if the tombstone has
disappeared. We checked this multiple times using cqlsh: issue a delete
statement and verify that data is not returned; then run repair and the
deleted data is returned again.

Our gc_grace_seconds is of the default value and no nodes ever were down
for anywhere close to 10 days, so it doesn't look like it's related. We
also made sure all our servers are running ntpd so time synchronization
should not be an issue as well.

Have you guys ever seen anything like this / have any idea as to what may
be causing this behavior? What could make tombstone disappear during
repair operation?

Thanks for your help. Let me know if I can provide more information.

Roman