Re: write timeout
Forgot to mention I am using Cassandra 2.0.13 On Mon, Mar 23, 2015 at 5:59 PM, Anishek Agarwal anis...@gmail.com wrote: Hello, I am using a single node server class machine with 16 CPUs with 32GB RAM with a single drive attached to it. my table structure is as below CREATE TABLE t1(id bigint, ts timestamp, cat1 settext, cat2 settext, lat float, lon float, a bigint, primary key (id, ts)); I am trying to insert 300 entries per partition key with 4000 partition keys using 25 threads. Configurations write_request_timeout_in_ms: 5000 concurrent_writes: 32 heap space : 8GB Client side timeout is 12 sec using datastax java driver. Consistency level: ONE With the above configuration i try to run it 10 times to eventually generate around 300 * 4000 * 10 = 1200 entries, When i run this after the first few runs i get a WriteTimeout exception at client with 1 replica were required but only 0 acknowledged the write message. There are no errors in server log. Why does this error come how do i know what is the limit I should limit concurrent writes to a single node to. Looking at iostat disk utilization seems to be at 1-3% when running this. Please let me know if anything else is required. Regards, Anishek
Re: 2d or multi dimension range query in cassandra CQL
i am using Startio Cassandra it way better than stargate as it works on the latest release of Cassandra and is better on my performance. we are using it for fulltext search use case Regards Asit On Sun, Mar 22, 2015 at 12:14 PM, Mehak Mehta meme...@cs.stonybrook.edu wrote: Hi, On the basis of some suggestions, I tried using tuplejump for multidimensional queries. Since other mostly needed root permissions (for building ) which I don't have on my cluster account. I found a major problem in tuplejump (stargate-core). When I am using it with a list type field in my table. It stops working. For e.g. create table person ( id int primary key, isActive boolean, age int, eyeColor varchar, name text, gender varchar, company varchar, email varchar, phone varchar, address text, points listdouble, stargate text ); with indexing as: CREATE CUSTOM INDEX person_idx ON PERSON(stargate) USING some 'com.tuplejump.stargate.RowIndex' WITH options = { 'sg_options':'{ fields:{ eyeColor:{}, age:{}, phone:{} } }' }; If I insert data in the table along with points list. The following query won't give any results (0 rows): SELECT * FROM RESULTS1 WHERE stargate ='{ filter: { type: range, field: x, lower: 0 } }'; I tried removing points listdouble from the table and it works i.e. same query will return results. Can somebody help me with this problem as I couldn't find much support from Stargate. Please note that I am using Cassandra 2.0.9 compatible with Stargate-core as given in link ( http://stargate-core.readthedocs.org/en/latest/quickstart.html). Thanks, Mehak On Wed, Mar 18, 2015 at 5:45 AM, Andres de la Peña adelap...@stratio.com wrote: Hi, With Stratio Cassandra you can create Lucene based indexes for multidimensional queries this way: ALTER TABLE images.results1 ADD lucene text ; CREATE CUSTOM INDEX lucene_idx ON images.results1 (lucene) USING 'com.stratio.cassandra.index.RowIndex' WITH OPTIONS = { 'refresh_seconds':'1', 'schema':'{ fields:{ image_caseid:{type:string}, x:{type:double}, y:{type:double} } } '}; Then you can perform the query using the dummy column: SELECT * FROM images.results1 WHERE lucene='{ filter:{type:boolean, must:[ {field:image_caseid, type:match, value:mehak}, {field:x, type:range, lower:100}, {field:y, type:range, lower:100} ]}}'; However, you can take advantage of partition key to route the query only to the nodes owning the data: SELECT * FROM images.results1 WHERE image_caseid='mehak' AND lucene='{ filter:{type:boolean, must:[ {field:x, type:range, lower:100}, {field:y, type:range, lower:100} ]}}'; Or, even better: SELECT * FROM images.results1 WHERE image_caseid='mehak' AND x100 AND lucene='{ filter:{field:y, type:range, lower:100}}'; Additionally, if your data are geospatial (latitude and longitude), soon you will can use the incoming spatial features. 2015-03-17 23:01 GMT+01:00 Mehak Mehta meme...@cs.stonybrook.edu: Sorry I gave you wrong table definition for query. Here a composite key of image_caseid, x and uuid which is unique. I have used x in clustering columns to query it. And used secondary index on y column. 1. Example *cqlsh:images CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x double, y double, submit_date timestamp, points listdouble, PRIMARY KEY ((image_caseid),x,uuid));* *cqlsh:images create index results1_y on results1(y);* In the below query you can see I have image_caseid as primary key which is filtered. Even then it is giving error that *No indexed columns present* *cqlsh:images select * from results1 where image_caseid='mehak' and x 100 and y100 order by image_caseid asc;* *code=2200 [Invalid query] message=No indexed columns present in by-columns clause with Equal operator* 2. Example I also tried including both x and y columns as composite key even then query gives following error: *cqlsh:images CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x double, y double, submit_date timestamp, points listdouble, PRIMARY KEY ((image_caseid),x,y,uuid));* *cqlsh:images select * from results1 where image_caseid='mehak' and x 100 and y100 order by image_caseid asc;* *code=2200 [Invalid query] message=PRIMARY KEY column y cannot be restricted (preceding column ColumnDefinition{name=x, type=org.apache.cassandra.db.marshal.DoubleType, kind=CLUSTERING_COLUMN, componentIndex=0, indexName=null, indexType=null} is either not restricted or by a non-EQ relation)* Thanks, Mehak On Tue, Mar 17, 2015 at 5:19 PM, Jack Krupansky jack.krupan...@gmail.com wrote: Yeah, you may have to add a dummy column populated with a constant,
write timeout
Hello, I am using a single node server class machine with 16 CPUs with 32GB RAM with a single drive attached to it. my table structure is as below CREATE TABLE t1(id bigint, ts timestamp, cat1 settext, cat2 settext, lat float, lon float, a bigint, primary key (id, ts)); I am trying to insert 300 entries per partition key with 4000 partition keys using 25 threads. Configurations write_request_timeout_in_ms: 5000 concurrent_writes: 32 heap space : 8GB Client side timeout is 12 sec using datastax java driver. Consistency level: ONE With the above configuration i try to run it 10 times to eventually generate around 300 * 4000 * 10 = 1200 entries, When i run this after the first few runs i get a WriteTimeout exception at client with 1 replica were required but only 0 acknowledged the write message. There are no errors in server log. Why does this error come how do i know what is the limit I should limit concurrent writes to a single node to. Looking at iostat disk utilization seems to be at 1-3% when running this. Please let me know if anything else is required. Regards, Anishek
Re: cassandra triggers
attached is the code . You follow the process for compiling and using the code. If anything more is required please let me know. The Jar file has to be put into /usr/share/cassandra/conf/triggers. Hope this helps Regards asit On Mon, Mar 23, 2015 at 3:20 PM, Rahul Bhardwaj rahul.bhard...@indiamart.com wrote: Yes Asit you can share it with me, let c if we can implement with our requirement. Regards: Rahul Bhardwaj On Mon, Mar 23, 2015 at 1:43 PM, Asit KAUSHIK asitkaushikno...@gmail.com wrote: Hi Rahul, i have created a trigger which inserts a default value into the table. But everyone are against using it. As its an external code which may be uncompatible in future releases. Its was a chnallenge as all the examples are of old 2.0.X veresion where RowMutable package is used which is discontinued in the later releases. If you still want the code i can give you . The application is same as on all the sites i used below for my reference. But again the code is for older release and would not work.. http://noflex.org/learn-experiment-cassandra-trigger/ On Mon, Mar 23, 2015 at 11:01 AM, Rahul Bhardwaj rahul.bhard...@indiamart.com wrote: Hi All, I want to use triggers in cassandra. Is there any tutorial on creating triggers in cassandra . Also I am not good in java. Pl help !! Regards: Rahul Bhardwaj Follow IndiaMART.com http://www.indiamart.com for latest updates on this and more: https://plus.google.com/+indiamart https://www.facebook.com/IndiaMART https://twitter.com/IndiaMART Mobile Channel: https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641mt=8 https://play.google.com/store/apps/details?id=com.indiamart.m http://m.indiamart.com/ https://www.youtube.com/watch?v=DzORNbeSXN8list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1index=2 Watch how IndiaMART Maximiser helped Mr. Khanna expand his business. kyunki Kaam Yahin Banta Hai https://www.youtube.com/watch?v=Q9fZ5ILY3w8feature=youtu.be!!! Follow IndiaMART.com http://www.indiamart.com for latest updates on this and more: https://plus.google.com/+indiamart https://www.facebook.com/IndiaMART https://twitter.com/IndiaMART Mobile Channel: https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641mt=8 https://play.google.com/store/apps/details?id=com.indiamart.m http://m.indiamart.com/ https://www.youtube.com/watch?v=DzORNbeSXN8list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1index=2 Watch how IndiaMART Maximiser helped Mr. Khanna expand his business. kyunki Kaam Yahin Banta Hai https://www.youtube.com/watch?v=Q9fZ5ILY3w8feature=youtu.be!!! InvertedIndex.java Description: Binary data
Re: write timeout
My group is seeing the same thing and also can not figure out why its happening. On Mon, Mar 23, 2015 at 8:36 AM, Anishek Agarwal anis...@gmail.com wrote: Forgot to mention I am using Cassandra 2.0.13 On Mon, Mar 23, 2015 at 5:59 PM, Anishek Agarwal anis...@gmail.com wrote: Hello, I am using a single node server class machine with 16 CPUs with 32GB RAM with a single drive attached to it. my table structure is as below CREATE TABLE t1(id bigint, ts timestamp, cat1 settext, cat2 settext, lat float, lon float, a bigint, primary key (id, ts)); I am trying to insert 300 entries per partition key with 4000 partition keys using 25 threads. Configurations write_request_timeout_in_ms: 5000 concurrent_writes: 32 heap space : 8GB Client side timeout is 12 sec using datastax java driver. Consistency level: ONE With the above configuration i try to run it 10 times to eventually generate around 300 * 4000 * 10 = 1200 entries, When i run this after the first few runs i get a WriteTimeout exception at client with 1 replica were required but only 0 acknowledged the write message. There are no errors in server log. Why does this error come how do i know what is the limit I should limit concurrent writes to a single node to. Looking at iostat disk utilization seems to be at 1-3% when running this. Please let me know if anything else is required. Regards, Anishek -- http://about.me/BrianTarbox
Re: cassandra triggers
okay, if you leave a comment in the blog on what is breaking and what cassandra, I can take a look at the code when I get the time. :-) jason On Mon, Mar 23, 2015 at 8:15 PM, Asit KAUSHIK asitkaushikno...@gmail.com wrote: attached is the code . You follow the process for compiling and using the code. If anything more is required please let me know. The Jar file has to be put into /usr/share/cassandra/conf/triggers. Hope this helps Regards asit On Mon, Mar 23, 2015 at 3:20 PM, Rahul Bhardwaj rahul.bhard...@indiamart.com wrote: Yes Asit you can share it with me, let c if we can implement with our requirement. Regards: Rahul Bhardwaj On Mon, Mar 23, 2015 at 1:43 PM, Asit KAUSHIK asitkaushikno...@gmail.com wrote: Hi Rahul, i have created a trigger which inserts a default value into the table. But everyone are against using it. As its an external code which may be uncompatible in future releases. Its was a chnallenge as all the examples are of old 2.0.X veresion where RowMutable package is used which is discontinued in the later releases. If you still want the code i can give you . The application is same as on all the sites i used below for my reference. But again the code is for older release and would not work.. http://noflex.org/learn-experiment-cassandra-trigger/ On Mon, Mar 23, 2015 at 11:01 AM, Rahul Bhardwaj rahul.bhard...@indiamart.com wrote: Hi All, I want to use triggers in cassandra. Is there any tutorial on creating triggers in cassandra . Also I am not good in java. Pl help !! Regards: Rahul Bhardwaj Follow IndiaMART.com http://www.indiamart.com for latest updates on this and more: https://plus.google.com/+indiamart https://www.facebook.com/IndiaMART https://twitter.com/IndiaMART Mobile Channel: https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641mt=8 https://play.google.com/store/apps/details?id=com.indiamart.m http://m.indiamart.com/ https://www.youtube.com/watch?v=DzORNbeSXN8list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1index=2 Watch how IndiaMART Maximiser helped Mr. Khanna expand his business. kyunki Kaam Yahin Banta Hai https://www.youtube.com/watch?v=Q9fZ5ILY3w8feature=youtu.be!!! Follow IndiaMART.com http://www.indiamart.com for latest updates on this and more: https://plus.google.com/+indiamart https://www.facebook.com/IndiaMART https://twitter.com/IndiaMART Mobile Channel: https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641mt=8 https://play.google.com/store/apps/details?id=com.indiamart.m http://m.indiamart.com/ https://www.youtube.com/watch?v=DzORNbeSXN8list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1index=2 Watch how IndiaMART Maximiser helped Mr. Khanna expand his business. kyunki Kaam Yahin Banta Hai https://www.youtube.com/watch?v=Q9fZ5ILY3w8feature=youtu.be!!!
Re: Unknown CF / Schema OK
I did figure this out: When adding a columnfamily, the query timed out before all nodes replied, and I sent the schema out again. Half the nodes ended up with the CF having UUID A and half the nodes ended up with the new CF but UUID B. UnknownColumnFamilyExceptions were thrown until the enqueued data exceeded memory. Eventually one half of the nodes crashed, with the other half having a consistent view of the CF. At this point I just dropped the offending CF schema in the active cluster, then the downed nodes could be re-added successfully. We lost some data. :( On Sun, Mar 22, 2015 at 11:39 AM, Tim Olson kash...@gmail.com wrote: After upgrading a schema, I'm getting lots of UnknownColumnFamilyException in the logs. However, all nodes have the same schema as reported by nodetool describecluster. I queried the system tables for the given column family UUID, but it doesn't appear in any of the schemas on any of the nodes. I restarted all clients, but that didn't help either. The cluster was running 2.1.2 but I recently upgraded to 2.1.3. Any ideas? This is basically making our production cluster highly unresponsive. Tim
Re: cassandra triggers
Yes Asit you can share it with me, let c if we can implement with our requirement. Regards: Rahul Bhardwaj On Mon, Mar 23, 2015 at 1:43 PM, Asit KAUSHIK asitkaushikno...@gmail.com wrote: Hi Rahul, i have created a trigger which inserts a default value into the table. But everyone are against using it. As its an external code which may be uncompatible in future releases. Its was a chnallenge as all the examples are of old 2.0.X veresion where RowMutable package is used which is discontinued in the later releases. If you still want the code i can give you . The application is same as on all the sites i used below for my reference. But again the code is for older release and would not work.. http://noflex.org/learn-experiment-cassandra-trigger/ On Mon, Mar 23, 2015 at 11:01 AM, Rahul Bhardwaj rahul.bhard...@indiamart.com wrote: Hi All, I want to use triggers in cassandra. Is there any tutorial on creating triggers in cassandra . Also I am not good in java. Pl help !! Regards: Rahul Bhardwaj Follow IndiaMART.com http://www.indiamart.com for latest updates on this and more: https://plus.google.com/+indiamart https://www.facebook.com/IndiaMART https://twitter.com/IndiaMART Mobile Channel: https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641mt=8 https://play.google.com/store/apps/details?id=com.indiamart.m http://m.indiamart.com/ https://www.youtube.com/watch?v=DzORNbeSXN8list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1index=2 Watch how IndiaMART Maximiser helped Mr. Khanna expand his business. kyunki Kaam Yahin Banta Hai https://www.youtube.com/watch?v=Q9fZ5ILY3w8feature=youtu.be!!! -- Follow IndiaMART.com http://www.indiamart.com for latest updates on this and more: https://plus.google.com/+indiamart https://www.facebook.com/IndiaMART https://twitter.com/IndiaMART Mobile Channel: https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641mt=8 https://play.google.com/store/apps/details?id=com.indiamart.m http://m.indiamart.com/ https://www.youtube.com/watch?v=DzORNbeSXN8list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1index=2 Watch how IndiaMART Maximiser helped Mr. Khanna expand his business. kyunki Kaam Yahin Banta Hai https://www.youtube.com/watch?v=Q9fZ5ILY3w8feature=youtu.be!!!
Re: Really high read latency
Duncan: I'm thinking it might be something like that. I'm also seeing just a ton of garbage collection on the box, could it be pulling rows for all 100k attrs for a given row_time into memory since only row_time is the partition key? Jens: I'm not using EBS (although I used to until I read up on how useless it is). I'm not sure what constitutes proper paging but my client has a pretty small amount of available memory so I'm doing pages of size 5k using the C++ Datastax driver. Thanks for the replies! -Dave On Mon, Mar 23, 2015 at 2:00 AM, Jens Rantil jens.ran...@tink.se wrote: Also, two control questions: - Are you using EBS for data storage? It might introduce additional latencies. - Are you doing proper paging when querying the keyspace? Cheers, Jens On Mon, Mar 23, 2015 at 5:56 AM, Dave Galbraith david92galbra...@gmail.com wrote: Hi! So I've got a table like this: CREATE TABLE default.metrics (row_time int,attrs varchar,offset int,value double, PRIMARY KEY(row_time, attrs, offset)) WITH COMPACT STORAGE AND bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0 AND gc_grace_seconds=864000 AND index_interval=128 AND read_repair_chance=1 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND speculative_retry='NONE' AND memtable_flush_period_in_ms=0 AND compaction={'class':'DateTieredCompactionStrategy','timestamp_resolution':'MILLISECONDS'} AND compression={'sstable_compression':'LZ4Compressor'}; and I'm running Cassandra on an EC2 m3.2xlarge out in the cloud, with 4 GB of heap space. So it's timeseries data that I'm doing so I increment row_time each day, attrs is additional identifying information about each series, and offset is the number of milliseconds into the day for each data point. So for the past 5 days, I've been inserting 3k points/second distributed across 100k distinct attrses. And now when I try to run queries on this data that look like SELECT * FROM default.metrics WHERE row_time = 5 AND attrs = 'potatoes_and_jam' it takes an absurdly long time and sometimes just times out. I did nodetool cftsats default and here's what I get: Keyspace: default Read Count: 59 Read Latency: 397.12523728813557 ms. Write Count: 155128 Write Latency: 0.3675690719921613 ms. Pending Flushes: 0 Table: metrics SSTable count: 26 Space used (live): 35146349027 Space used (total): 35146349027 Space used by snapshots (total): 0 SSTable Compression Ratio: 0.10386468749216264 Memtable cell count: 141800 Memtable data size: 31071290 Memtable switch count: 41 Local read count: 59 Local read latency: 397.126 ms Local write count: 155128 Local write latency: 0.368 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.0 Bloom filter space used: 2856 Compacted partition minimum bytes: 104 Compacted partition maximum bytes: 36904729268 Compacted partition mean bytes: 986530969 Average live cells per slice (last five minutes): 501.66101694915255 Maximum live cells per slice (last five minutes): 502.0 Average tombstones per slice (last five minutes): 0.0 Maximum tombstones per slice (last five minutes): 0.0 Ouch! 400ms of read latency, orders of magnitude higher than it has any right to be. How could this have happened? Is there something fundamentally broken about my data model? Thanks! -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Re: Really high read latency
Also, two control questions: - Are you using EBS for data storage? It might introduce additional latencies. - Are you doing proper paging when querying the keyspace? Cheers, Jens On Mon, Mar 23, 2015 at 5:56 AM, Dave Galbraith david92galbra...@gmail.com wrote: Hi! So I've got a table like this: CREATE TABLE default.metrics (row_time int,attrs varchar,offset int,value double, PRIMARY KEY(row_time, attrs, offset)) WITH COMPACT STORAGE AND bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0 AND gc_grace_seconds=864000 AND index_interval=128 AND read_repair_chance=1 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND speculative_retry='NONE' AND memtable_flush_period_in_ms=0 AND compaction={'class':'DateTieredCompactionStrategy','timestamp_resolution':'MILLISECONDS'} AND compression={'sstable_compression':'LZ4Compressor'}; and I'm running Cassandra on an EC2 m3.2xlarge out in the cloud, with 4 GB of heap space. So it's timeseries data that I'm doing so I increment row_time each day, attrs is additional identifying information about each series, and offset is the number of milliseconds into the day for each data point. So for the past 5 days, I've been inserting 3k points/second distributed across 100k distinct attrses. And now when I try to run queries on this data that look like SELECT * FROM default.metrics WHERE row_time = 5 AND attrs = 'potatoes_and_jam' it takes an absurdly long time and sometimes just times out. I did nodetool cftsats default and here's what I get: Keyspace: default Read Count: 59 Read Latency: 397.12523728813557 ms. Write Count: 155128 Write Latency: 0.3675690719921613 ms. Pending Flushes: 0 Table: metrics SSTable count: 26 Space used (live): 35146349027 Space used (total): 35146349027 Space used by snapshots (total): 0 SSTable Compression Ratio: 0.10386468749216264 Memtable cell count: 141800 Memtable data size: 31071290 Memtable switch count: 41 Local read count: 59 Local read latency: 397.126 ms Local write count: 155128 Local write latency: 0.368 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.0 Bloom filter space used: 2856 Compacted partition minimum bytes: 104 Compacted partition maximum bytes: 36904729268 Compacted partition mean bytes: 986530969 Average live cells per slice (last five minutes): 501.66101694915255 Maximum live cells per slice (last five minutes): 502.0 Average tombstones per slice (last five minutes): 0.0 Maximum tombstones per slice (last five minutes): 0.0 Ouch! 400ms of read latency, orders of magnitude higher than it has any right to be. How could this have happened? Is there something fundamentally broken about my data model? Thanks! -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Re: Logging client ID for YCSB workloads on Cassandra?
Sure. I updated the YCSB code to pass a client ID as input parameter and then stored the clientID in the properties and used it in the DBWrapper class for logging per operation *YCSB/core/src/main/java/com/yahoo/ycsb/DBWrapper.java* Please let me know if you need more information and I can share my code samples, if that helps. — Jatin Ganhotra Graduate Student, Computer Science University of Illinois at Urbana Champaign http://jatinganhotra.com http://linkedin.com/in/jatinganhotra On Fri, Mar 20, 2015 at 12:03 PM, Jan cne...@yahoo.com wrote: HI Jatin; besides enabling Tracing, is there any other way to get the task done ? (to log the client ID for every operation) Please share with the community the solution, so that we could collectively learn from your experience. cheers Jan/ On Friday, February 20, 2015 12:48 PM, Jatin Ganhotra jatin.ganho...@gmail.com wrote: Never mind, got it working. Thanks :) — Jatin Ganhotra Graduate Student, Computer Science University of Illinois at Urbana Champaign http://jatinganhotra.com http://linkedin.com/in/jatinganhotra On Wed, Feb 18, 2015 at 7:09 PM, Jatin Ganhotra jatin.ganho...@gmail.com wrote: Hi, I'd like to log the client ID for every operation performed by the YCSB on my Cassandra cluster. The purpose is to identify analyze various other consistency measures other than eventual consistency. I wanted to know if people have done something similar in the past. Or am I missing something really basic here? Please let me know if you need more information. Thanks — Jatin Ganhotra
Fwd: [RELEASE] Kundera-2.16 (Added support for Cassandra's UDTs)
On Tuesday, 17 March 2015 22:42:30 UTC+5:30, Chhavi Gangwal wrote: Hi All, We are happy to announce Kundera-2.16 release. Kundera is a JPA 2.1 compliant, polyglot object-datastore mapping library for NoSQL datastores. The idea behind Kundera is to make working with NoSQL databases drop-dead simple and fun. It currently supports Cassandra, HBase, MongoDB, Redis, OracleNoSQL, Neo4j,ElasticSearch,CouchDB and relational databases. Major Changes in 2.16 and 2.15.1: = 1) Support added for Cassandra-2.1.x version. 2) Support for Cassandra User Defined Types as embeddables. 3) Aggregation support available with elastic search is also enabled in Kundera 4) Hbase data remodeling with support for HBase-1.0 version. ** *Support for aggregate functions is also extended for other Kundera clients' using Elastic search as indexing store. * Support for Hbase version 1.0 with the revised data model is available with Kundera's kundera-hbase-v2 dependency. Further details on this will be added to wiki soon. Github Bug Fixes : = https://github.com/impetus-opensource/Kundera/issues/716 https://github.com/impetus-opensource/Kundera/issues/714 https://github.com/impetus-opensource/Kundera/issues/659 https://github.com/impetus-opensource/Kundera/issues/641 https://github.com/impetus-opensource/Kundera/issues/708 https://github.com/impetus-opensource/Kundera/issues/707 https://github.com/impetus-opensource/Kundera/issues/693 https://github.com/impetus-opensource/Kundera/issues/672 How to Download: = To download, use or contribute to Kundera, visit: http://github.com/impetus-opensource/Kundera Latest release of Kundera's tag is 2.16 whose maven libraries are now available at: https://oss.sonatype.org/content/repositories/releases/com/impetus. 2.16 release of Kundera is compatible with Cassandra2.x which includes JDK 1.7 as one of its pre-requisites. The older versions of Cassandra(1.x) over JPA2.0 can be used with archived versions of Kundera and its current release's branch - Kundera-2.12-1.x hosted at : https://github.com/impetus-opensource/Kundera/releases/tag/kundera-2.12-1.x , whose maven libraries are also available at : https://oss.sonatype.org/content/repositories/releases/com/impetus Sample code and examples for using Kundera can be found here: https://github.com/impetus-opensource/Kundera/tree/trunk/src/kundera-tests Troubleshooting : === In case you are using 2.16 version Kundera with Cassandra make sure you have JDK 1.7 installed. Also, if you wish to use UDT support with Cassandra please enable CLQ3 before performing any operations. Please share you feedback with us by filling a simple survey: http://www.surveymonkey.com/s/BMB9PWG Thank you all for your contributions and using Kundera! Regards, Kundera Team Follow us on twitter https://twitter.com/kundera_impetus,linkedin http://in.linkedin.com/pub/kundera-impetus/b4/870/153
Cassandra time series + Spark
Hi, I'm working on a system which has to deal with time series data. I've been happy using Cassandra for time series and Spark looks promising as a computational platform. I consider chunking time series in Cassandra necessary, e.g. by 3 weeks as kairosdb does it. This allows an 8 byte chunk start timestamp with 4 byte offsets for the individual measurements. And it keeps the data below 2x10^9 even at 1000 Hz. This schema works quite okay when dealing with one time series at a time. Because the data is partitioned by time series id and chunk of time (e.g. the three weeks mentioned above), it requires a little client side logic to retrieve the partitions and glue them together, but this is quite okay. However, when working with many / all of the time series in a table at once, e.g. in Spark, the story changes dramatically. Say I'd want to compute something simple as a moving average, I have to deal with data all over the place. I can't currently think of anything but performing aggregateByKey causing a shuffle every time. Anyone have experience with combining time series chunking and computation on all / many time series at once? Any advice? Cheers, Frens Jan
Re: cassandra triggers
Hi Rahul, i have created a trigger which inserts a default value into the table. But everyone are against using it. As its an external code which may be uncompatible in future releases. Its was a chnallenge as all the examples are of old 2.0.X veresion where RowMutable package is used which is discontinued in the later releases. If you still want the code i can give you . The application is same as on all the sites i used below for my reference. But again the code is for older release and would not work.. http://noflex.org/learn-experiment-cassandra-trigger/ On Mon, Mar 23, 2015 at 11:01 AM, Rahul Bhardwaj rahul.bhard...@indiamart.com wrote: Hi All, I want to use triggers in cassandra. Is there any tutorial on creating triggers in cassandra . Also I am not good in java. Pl help !! Regards: Rahul Bhardwaj Follow IndiaMART.com http://www.indiamart.com for latest updates on this and more: https://plus.google.com/+indiamart https://www.facebook.com/IndiaMART https://twitter.com/IndiaMART Mobile Channel: https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641mt=8 https://play.google.com/store/apps/details?id=com.indiamart.m http://m.indiamart.com/ https://www.youtube.com/watch?v=DzORNbeSXN8list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1index=2 Watch how IndiaMART Maximiser helped Mr. Khanna expand his business. kyunki Kaam Yahin Banta Hai https://www.youtube.com/watch?v=Q9fZ5ILY3w8feature=youtu.be!!!
Re: Really high read latency
Compacted partition maximum bytes: 36904729268 thats huge... 36gb rows are gonna cause a lot of problems, even when you specify a precise cell under this it still is going to have an enormous column index to deserialize on every read for the partition. As mentioned above, you should include your attribute name in the partition key ((row_time, attrs)) to spread this out... Id call that critical Chris On Mon, Mar 23, 2015 at 4:13 PM, Dave Galbraith david92galbra...@gmail.com wrote: I haven't deleted anything. Here's output from a traced cqlsh query (I tried to make the spaces line up, hope it's legible): Execute CQL3 query | 2015-03-23 21:04:37.422000 | 172.31.32.211 | 0 Parsing select * from default.metrics where row_time = 16511 and attrs = '[redacted]' limit 100; [SharedPool-Worker-2] | 2015-03-23 21:04:37.423000 | 172.31.32.211 | 93 Preparing statement [SharedPool-Worker-2] | 2015-03-23 21:04:37.423000 | 172.31.32.211 |696 Executing single-partition query on metrics [SharedPool-Worker-1] | 2015-03-23 21:04:37.425000 | 172.31.32.211 | 2807 Acquiring sstable references [SharedPool-Worker-1] | 2015-03-23 21:04:37.425000 | 172.31.32.211 | 2993 Merging memtable tombstones [SharedPool-Worker-1] | 2015-03-23 21:04:37.426000 | 172.31.32.211 | 3049 Partition index with 484338 entries found for sstable 15966 [SharedPool-Worker-1] | 2015-03-23 21:04:38.625000 | 172.31.32.211 | 202304 Seeking to partition indexed section in data file [SharedPool-Worker-1] | 2015-03-23 21:04:38.625000 | 172.31.32.211 | 202354 Bloom filter allows skipping sstable 5613 [SharedPool-Worker-1] | 2015-03-23 21:04:38.625000 | 172.31.32.211 | 202445 Bloom filter allows skipping sstable 5582 [SharedPool-Worker-1] | 2015-03-23 21:04:38.625000 | 172.31.32.211 | 202478 Bloom filter allows skipping sstable 5611 [SharedPool-Worker-1] | 2015-03-23 21:04:38.625000 | 172.31.32.211 | 202508 Bloom filter allows skipping sstable 5610 [SharedPool-Worker-1] | 2015-03-23 21:04:38.625000 | 172.31.32.211 | 202539 Bloom filter allows skipping sstable 5549 [SharedPool-Worker-1] | 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202678 Bloom filter allows skipping sstable 5544 [SharedPool-Worker-1] | 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202720 Bloom filter allows skipping sstable 5237 [SharedPool-Worker-1] | 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202752 Bloom filter allows skipping sstable 2516 [SharedPool-Worker-1] | 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202782 Bloom filter allows skipping sstable 2632 [SharedPool-Worker-1] | 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202812 Bloom filter allows skipping sstable 3015 [SharedPool-Worker-1] | 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202852 Skipped 0/11 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-1] | 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202882 Merging data from memtables and 1 sstables [SharedPool-Worker-1] | 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202902 Read 101 live and 0 tombstoned cells [SharedPool-Worker-1] | 2015-03-23 21:04:38.626000 | 172.31.32.211 | 203752 Request complete | 2015-03-23 21:04:38.628253 | 172.31.32.211 | 206253 On Mon, Mar 23, 2015 at 11:53 AM, Eric Stevens migh...@gmail.com wrote: Enable tracing in cqlsh and see how many sstables are being lifted to satisfy the query (are you repeatedly writing to the same partition [row_time]) over time?). Also watch for whether you're hitting a lot of tombstones (are you deleting lots of values in the same partition over time?). On Mon, Mar 23, 2015 at 4:01 AM, Dave Galbraith david92galbra...@gmail.com wrote: Duncan: I'm thinking it might be something like that. I'm also seeing just a ton of garbage collection on the box, could it be pulling rows for all 100k attrs for a given row_time into memory since only row_time is the partition key? Jens: I'm not using EBS (although I used to until I read up on how useless it is). I'm not sure what constitutes proper paging but my client has a pretty small amount of available memory so I'm doing pages of size 5k using the C++ Datastax driver. Thanks for the replies! -Dave On Mon, Mar 23, 2015 at 2:00 AM, Jens Rantil jens.ran...@tink.se wrote: Also, two control questions: - Are you using EBS for data storage? It might introduce additional latencies. - Are you doing proper paging when querying the
Re: Really high read latency
I haven't deleted anything. Here's output from a traced cqlsh query (I tried to make the spaces line up, hope it's legible): Execute CQL3 query | 2015-03-23 21:04:37.422000 | 172.31.32.211 | 0 Parsing select * from default.metrics where row_time = 16511 and attrs = '[redacted]' limit 100; [SharedPool-Worker-2] | 2015-03-23 21:04:37.423000 | 172.31.32.211 | 93 Preparing statement [SharedPool-Worker-2] | 2015-03-23 21:04:37.423000 | 172.31.32.211 |696 Executing single-partition query on metrics [SharedPool-Worker-1] | 2015-03-23 21:04:37.425000 | 172.31.32.211 | 2807 Acquiring sstable references [SharedPool-Worker-1] | 2015-03-23 21:04:37.425000 | 172.31.32.211 | 2993 Merging memtable tombstones [SharedPool-Worker-1] | 2015-03-23 21:04:37.426000 | 172.31.32.211 | 3049 Partition index with 484338 entries found for sstable 15966 [SharedPool-Worker-1] | 2015-03-23 21:04:38.625000 | 172.31.32.211 | 202304 Seeking to partition indexed section in data file [SharedPool-Worker-1] | 2015-03-23 21:04:38.625000 | 172.31.32.211 | 202354 Bloom filter allows skipping sstable 5613 [SharedPool-Worker-1] | 2015-03-23 21:04:38.625000 | 172.31.32.211 | 202445 Bloom filter allows skipping sstable 5582 [SharedPool-Worker-1] | 2015-03-23 21:04:38.625000 | 172.31.32.211 | 202478 Bloom filter allows skipping sstable 5611 [SharedPool-Worker-1] | 2015-03-23 21:04:38.625000 | 172.31.32.211 | 202508 Bloom filter allows skipping sstable 5610 [SharedPool-Worker-1] | 2015-03-23 21:04:38.625000 | 172.31.32.211 | 202539 Bloom filter allows skipping sstable 5549 [SharedPool-Worker-1] | 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202678 Bloom filter allows skipping sstable 5544 [SharedPool-Worker-1] | 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202720 Bloom filter allows skipping sstable 5237 [SharedPool-Worker-1] | 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202752 Bloom filter allows skipping sstable 2516 [SharedPool-Worker-1] | 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202782 Bloom filter allows skipping sstable 2632 [SharedPool-Worker-1] | 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202812 Bloom filter allows skipping sstable 3015 [SharedPool-Worker-1] | 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202852 Skipped 0/11 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-1] | 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202882 Merging data from memtables and 1 sstables [SharedPool-Worker-1] | 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202902 Read 101 live and 0 tombstoned cells [SharedPool-Worker-1] | 2015-03-23 21:04:38.626000 | 172.31.32.211 | 203752 Request complete | 2015-03-23 21:04:38.628253 | 172.31.32.211 | 206253 On Mon, Mar 23, 2015 at 11:53 AM, Eric Stevens migh...@gmail.com wrote: Enable tracing in cqlsh and see how many sstables are being lifted to satisfy the query (are you repeatedly writing to the same partition [row_time]) over time?). Also watch for whether you're hitting a lot of tombstones (are you deleting lots of values in the same partition over time?). On Mon, Mar 23, 2015 at 4:01 AM, Dave Galbraith david92galbra...@gmail.com wrote: Duncan: I'm thinking it might be something like that. I'm also seeing just a ton of garbage collection on the box, could it be pulling rows for all 100k attrs for a given row_time into memory since only row_time is the partition key? Jens: I'm not using EBS (although I used to until I read up on how useless it is). I'm not sure what constitutes proper paging but my client has a pretty small amount of available memory so I'm doing pages of size 5k using the C++ Datastax driver. Thanks for the replies! -Dave On Mon, Mar 23, 2015 at 2:00 AM, Jens Rantil jens.ran...@tink.se wrote: Also, two control questions: - Are you using EBS for data storage? It might introduce additional latencies. - Are you doing proper paging when querying the keyspace? Cheers, Jens On Mon, Mar 23, 2015 at 5:56 AM, Dave Galbraith david92galbra...@gmail.com wrote: Hi! So I've got a table like this: CREATE TABLE default.metrics (row_time int,attrs varchar,offset int,value double, PRIMARY KEY(row_time, attrs, offset)) WITH COMPACT STORAGE AND bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0 AND gc_grace_seconds=864000 AND index_interval=128 AND read_repair_chance=1 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND
Re: 2d or multi dimension range query in cassandra CQL
Hi, I checked Startio Cassandra but couldn't get any good documentation for the same. Can you give me some pointers on how to use it. Do I have to build it from the source or I can use it directly with jar files as in case of Stargate. Since I was looking for solution which I don't need a full build and can be used with existing tar of cassandra because I have some restrictions on installing stuff on my server. Thanks, Mehak On Mon, Mar 23, 2015 at 8:17 AM, Asit KAUSHIK asitkaushikno...@gmail.com wrote: i am using Startio Cassandra it way better than stargate as it works on the latest release of Cassandra and is better on my performance. we are using it for fulltext search use case Regards Asit On Sun, Mar 22, 2015 at 12:14 PM, Mehak Mehta meme...@cs.stonybrook.edu wrote: Hi, On the basis of some suggestions, I tried using tuplejump for multidimensional queries. Since other mostly needed root permissions (for building ) which I don't have on my cluster account. I found a major problem in tuplejump (stargate-core). When I am using it with a list type field in my table. It stops working. For e.g. create table person ( id int primary key, isActive boolean, age int, eyeColor varchar, name text, gender varchar, company varchar, email varchar, phone varchar, address text, points listdouble, stargate text ); with indexing as: CREATE CUSTOM INDEX person_idx ON PERSON(stargate) USING some 'com.tuplejump.stargate.RowIndex' WITH options = { 'sg_options':'{ fields:{ eyeColor:{}, age:{}, phone:{} } }' }; If I insert data in the table along with points list. The following query won't give any results (0 rows): SELECT * FROM RESULTS1 WHERE stargate ='{ filter: { type: range, field: x, lower: 0 } }'; I tried removing points listdouble from the table and it works i.e. same query will return results. Can somebody help me with this problem as I couldn't find much support from Stargate. Please note that I am using Cassandra 2.0.9 compatible with Stargate-core as given in link ( http://stargate-core.readthedocs.org/en/latest/quickstart.html). Thanks, Mehak On Wed, Mar 18, 2015 at 5:45 AM, Andres de la Peña adelap...@stratio.com wrote: Hi, With Stratio Cassandra you can create Lucene based indexes for multidimensional queries this way: ALTER TABLE images.results1 ADD lucene text ; CREATE CUSTOM INDEX lucene_idx ON images.results1 (lucene) USING 'com.stratio.cassandra.index.RowIndex' WITH OPTIONS = { 'refresh_seconds':'1', 'schema':'{ fields:{ image_caseid:{type:string}, x:{type:double}, y:{type:double} } } '}; Then you can perform the query using the dummy column: SELECT * FROM images.results1 WHERE lucene='{ filter:{type:boolean, must:[ {field:image_caseid, type:match, value:mehak}, {field:x, type:range, lower:100}, {field:y, type:range, lower:100} ]}}'; However, you can take advantage of partition key to route the query only to the nodes owning the data: SELECT * FROM images.results1 WHERE image_caseid='mehak' AND lucene='{ filter:{type:boolean, must:[ {field:x, type:range, lower:100}, {field:y, type:range, lower:100} ]}}'; Or, even better: SELECT * FROM images.results1 WHERE image_caseid='mehak' AND x100 AND lucene='{ filter:{field:y, type:range, lower:100}}'; Additionally, if your data are geospatial (latitude and longitude), soon you will can use the incoming spatial features. 2015-03-17 23:01 GMT+01:00 Mehak Mehta meme...@cs.stonybrook.edu: Sorry I gave you wrong table definition for query. Here a composite key of image_caseid, x and uuid which is unique. I have used x in clustering columns to query it. And used secondary index on y column. 1. Example *cqlsh:images CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x double, y double, submit_date timestamp, points listdouble, PRIMARY KEY ((image_caseid),x,uuid));* *cqlsh:images create index results1_y on results1(y);* In the below query you can see I have image_caseid as primary key which is filtered. Even then it is giving error that *No indexed columns present* *cqlsh:images select * from results1 where image_caseid='mehak' and x 100 and y100 order by image_caseid asc;* *code=2200 [Invalid query] message=No indexed columns present in by-columns clause with Equal operator* 2. Example I also tried including both x and y columns as composite key even then query gives following error: *cqlsh:images CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x double, y double, submit_date timestamp, points listdouble, PRIMARY KEY ((image_caseid),x,y,uuid));* *cqlsh:images select * from results1 where
Re: cassandra triggers
On Sun, Mar 22, 2015 at 10:31 PM, Rahul Bhardwaj rahul.bhard...@indiamart.com wrote: I want to use triggers in cassandra. Is there any tutorial on creating triggers in cassandra . For the record, it is my understanding that you almost certainly should not use the current Cassandra triggers in production for any real work. =Rob
Re: write timeout
On Mon, Mar 23, 2015 at 7:27 AM, Brian Tarbox briantar...@gmail.com wrote: My group is seeing the same thing and also can not figure out why its happening. On Mon, Mar 23, 2015 at 8:36 AM, Anishek Agarwal anis...@gmail.com wrote: Forgot to mention I am using Cassandra 2.0.13 This seems like a rather significant bug in the most recent stable version. In this case, I would tend to file a JIRA first and then ask the mailing list second. Could one or both of you file steps-to-reproduce with a JIRA at http://issues.apache.org? =Rob
Re: Really high read latency
Enable tracing in cqlsh and see how many sstables are being lifted to satisfy the query (are you repeatedly writing to the same partition [row_time]) over time?). Also watch for whether you're hitting a lot of tombstones (are you deleting lots of values in the same partition over time?). On Mon, Mar 23, 2015 at 4:01 AM, Dave Galbraith david92galbra...@gmail.com wrote: Duncan: I'm thinking it might be something like that. I'm also seeing just a ton of garbage collection on the box, could it be pulling rows for all 100k attrs for a given row_time into memory since only row_time is the partition key? Jens: I'm not using EBS (although I used to until I read up on how useless it is). I'm not sure what constitutes proper paging but my client has a pretty small amount of available memory so I'm doing pages of size 5k using the C++ Datastax driver. Thanks for the replies! -Dave On Mon, Mar 23, 2015 at 2:00 AM, Jens Rantil jens.ran...@tink.se wrote: Also, two control questions: - Are you using EBS for data storage? It might introduce additional latencies. - Are you doing proper paging when querying the keyspace? Cheers, Jens On Mon, Mar 23, 2015 at 5:56 AM, Dave Galbraith david92galbra...@gmail.com wrote: Hi! So I've got a table like this: CREATE TABLE default.metrics (row_time int,attrs varchar,offset int,value double, PRIMARY KEY(row_time, attrs, offset)) WITH COMPACT STORAGE AND bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0 AND gc_grace_seconds=864000 AND index_interval=128 AND read_repair_chance=1 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND speculative_retry='NONE' AND memtable_flush_period_in_ms=0 AND compaction={'class':'DateTieredCompactionStrategy','timestamp_resolution':'MILLISECONDS'} AND compression={'sstable_compression':'LZ4Compressor'}; and I'm running Cassandra on an EC2 m3.2xlarge out in the cloud, with 4 GB of heap space. So it's timeseries data that I'm doing so I increment row_time each day, attrs is additional identifying information about each series, and offset is the number of milliseconds into the day for each data point. So for the past 5 days, I've been inserting 3k points/second distributed across 100k distinct attrses. And now when I try to run queries on this data that look like SELECT * FROM default.metrics WHERE row_time = 5 AND attrs = 'potatoes_and_jam' it takes an absurdly long time and sometimes just times out. I did nodetool cftsats default and here's what I get: Keyspace: default Read Count: 59 Read Latency: 397.12523728813557 ms. Write Count: 155128 Write Latency: 0.3675690719921613 ms. Pending Flushes: 0 Table: metrics SSTable count: 26 Space used (live): 35146349027 Space used (total): 35146349027 Space used by snapshots (total): 0 SSTable Compression Ratio: 0.10386468749216264 Memtable cell count: 141800 Memtable data size: 31071290 Memtable switch count: 41 Local read count: 59 Local read latency: 397.126 ms Local write count: 155128 Local write latency: 0.368 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.0 Bloom filter space used: 2856 Compacted partition minimum bytes: 104 Compacted partition maximum bytes: 36904729268 Compacted partition mean bytes: 986530969 Average live cells per slice (last five minutes): 501.66101694915255 Maximum live cells per slice (last five minutes): 502.0 Average tombstones per slice (last five minutes): 0.0 Maximum tombstones per slice (last five minutes): 0.0 Ouch! 400ms of read latency, orders of magnitude higher than it has any right to be. How could this have happened? Is there something fundamentally broken about my data model? Thanks! -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Re: 2d or multi dimension range query in cassandra CQL
Hi, You can download Stratio Cassandra binaries from https://s3.amazonaws.com/stratioorg/cassandra/stratio-cassandra-2.1.3.1-bin.tar.gz You can get info about how to build and getting started at its README file https://github.com/Stratio/stratio-cassandra/blob/master/README.md. More detailed info can be found at https://github.com/Stratio/stratio-cassandra/blob/master/doc/stratio/extended-search-in-cassandra.md . Regards, 2015-03-23 18:07 GMT+01:00 Mehak Mehta meme...@cs.stonybrook.edu: Hi, I checked Startio Cassandra but couldn't get any good documentation for the same. Can you give me some pointers on how to use it. Do I have to build it from the source or I can use it directly with jar files as in case of Stargate. Since I was looking for solution which I don't need a full build and can be used with existing tar of cassandra because I have some restrictions on installing stuff on my server. Thanks, Mehak On Mon, Mar 23, 2015 at 8:17 AM, Asit KAUSHIK asitkaushikno...@gmail.com wrote: i am using Startio Cassandra it way better than stargate as it works on the latest release of Cassandra and is better on my performance. we are using it for fulltext search use case Regards Asit On Sun, Mar 22, 2015 at 12:14 PM, Mehak Mehta meme...@cs.stonybrook.edu wrote: Hi, On the basis of some suggestions, I tried using tuplejump for multidimensional queries. Since other mostly needed root permissions (for building ) which I don't have on my cluster account. I found a major problem in tuplejump (stargate-core). When I am using it with a list type field in my table. It stops working. For e.g. create table person ( id int primary key, isActive boolean, age int, eyeColor varchar, name text, gender varchar, company varchar, email varchar, phone varchar, address text, points listdouble, stargate text ); with indexing as: CREATE CUSTOM INDEX person_idx ON PERSON(stargate) USING some 'com.tuplejump.stargate.RowIndex' WITH options = { 'sg_options':'{ fields:{ eyeColor:{}, age:{}, phone:{} } }' }; If I insert data in the table along with points list. The following query won't give any results (0 rows): SELECT * FROM RESULTS1 WHERE stargate ='{ filter: { type: range, field: x, lower: 0 } }'; I tried removing points listdouble from the table and it works i.e. same query will return results. Can somebody help me with this problem as I couldn't find much support from Stargate. Please note that I am using Cassandra 2.0.9 compatible with Stargate-core as given in link ( http://stargate-core.readthedocs.org/en/latest/quickstart.html). Thanks, Mehak On Wed, Mar 18, 2015 at 5:45 AM, Andres de la Peña adelap...@stratio.com wrote: Hi, With Stratio Cassandra you can create Lucene based indexes for multidimensional queries this way: ALTER TABLE images.results1 ADD lucene text ; CREATE CUSTOM INDEX lucene_idx ON images.results1 (lucene) USING 'com.stratio.cassandra.index.RowIndex' WITH OPTIONS = { 'refresh_seconds':'1', 'schema':'{ fields:{ image_caseid:{type:string}, x:{type:double}, y:{type:double} } } '}; Then you can perform the query using the dummy column: SELECT * FROM images.results1 WHERE lucene='{ filter:{type:boolean, must:[ {field:image_caseid, type:match, value:mehak}, {field:x, type:range, lower:100}, {field:y, type:range, lower:100} ]}}'; However, you can take advantage of partition key to route the query only to the nodes owning the data: SELECT * FROM images.results1 WHERE image_caseid='mehak' AND lucene='{ filter:{type:boolean, must:[ {field:x, type:range, lower:100}, {field:y, type:range, lower:100} ]}}'; Or, even better: SELECT * FROM images.results1 WHERE image_caseid='mehak' AND x100 AND lucene='{ filter:{field:y, type:range, lower:100}}'; Additionally, if your data are geospatial (latitude and longitude), soon you will can use the incoming spatial features. 2015-03-17 23:01 GMT+01:00 Mehak Mehta meme...@cs.stonybrook.edu: Sorry I gave you wrong table definition for query. Here a composite key of image_caseid, x and uuid which is unique. I have used x in clustering columns to query it. And used secondary index on y column. 1. Example *cqlsh:images CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x double, y double, submit_date timestamp, points listdouble, PRIMARY KEY ((image_caseid),x,uuid));* *cqlsh:images create index results1_y on results1(y);* In the below query you can see I have image_caseid as primary key which is filtered. Even then it is giving error that *No indexed columns present* *cqlsh:images select * from results1 where image_caseid='mehak' and x 100 and y100 order by image_caseid asc;*
Re: Really high read latency
nodetool cfhistograms is also very helpful in diagnosing these kinds of data modelling issues. On 23 March 2015 at 14:43, Chris Lohfink clohfin...@gmail.com wrote: Compacted partition maximum bytes: 36904729268 thats huge... 36gb rows are gonna cause a lot of problems, even when you specify a precise cell under this it still is going to have an enormous column index to deserialize on every read for the partition. As mentioned above, you should include your attribute name in the partition key ((row_time, attrs)) to spread this out... Id call that critical Chris On Mon, Mar 23, 2015 at 4:13 PM, Dave Galbraith david92galbra...@gmail.com wrote: I haven't deleted anything. Here's output from a traced cqlsh query (I tried to make the spaces line up, hope it's legible): Execute CQL3 query | 2015-03-23 21:04:37.422000 | 172.31.32.211 | 0 Parsing select * from default.metrics where row_time = 16511 and attrs = '[redacted]' limit 100; [SharedPool-Worker-2] | 2015-03-23 21:04:37.423000 | 172.31.32.211 | 93 Preparing statement [SharedPool-Worker-2] | 2015-03-23 21:04:37.423000 | 172.31.32.211 |696 Executing single-partition query on metrics [SharedPool-Worker-1] | 2015-03-23 21:04:37.425000 | 172.31.32.211 | 2807 Acquiring sstable references [SharedPool-Worker-1] | 2015-03-23 21:04:37.425000 | 172.31.32.211 | 2993 Merging memtable tombstones [SharedPool-Worker-1] | 2015-03-23 21:04:37.426000 | 172.31.32.211 | 3049 Partition index with 484338 entries found for sstable 15966 [SharedPool-Worker-1] | 2015-03-23 21:04:38.625000 | 172.31.32.211 | 202304 Seeking to partition indexed section in data file [SharedPool-Worker-1] | 2015-03-23 21:04:38.625000 | 172.31.32.211 | 202354 Bloom filter allows skipping sstable 5613 [SharedPool-Worker-1] | 2015-03-23 21:04:38.625000 | 172.31.32.211 | 202445 Bloom filter allows skipping sstable 5582 [SharedPool-Worker-1] | 2015-03-23 21:04:38.625000 | 172.31.32.211 | 202478 Bloom filter allows skipping sstable 5611 [SharedPool-Worker-1] | 2015-03-23 21:04:38.625000 | 172.31.32.211 | 202508 Bloom filter allows skipping sstable 5610 [SharedPool-Worker-1] | 2015-03-23 21:04:38.625000 | 172.31.32.211 | 202539 Bloom filter allows skipping sstable 5549 [SharedPool-Worker-1] | 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202678 Bloom filter allows skipping sstable 5544 [SharedPool-Worker-1] | 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202720 Bloom filter allows skipping sstable 5237 [SharedPool-Worker-1] | 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202752 Bloom filter allows skipping sstable 2516 [SharedPool-Worker-1] | 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202782 Bloom filter allows skipping sstable 2632 [SharedPool-Worker-1] | 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202812 Bloom filter allows skipping sstable 3015 [SharedPool-Worker-1] | 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202852 Skipped 0/11 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-1] | 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202882 Merging data from memtables and 1 sstables [SharedPool-Worker-1] | 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202902 Read 101 live and 0 tombstoned cells [SharedPool-Worker-1] | 2015-03-23 21:04:38.626000 | 172.31.32.211 | 203752 Request complete | 2015-03-23 21:04:38.628253 | 172.31.32.211 | 206253 On Mon, Mar 23, 2015 at 11:53 AM, Eric Stevens migh...@gmail.com wrote: Enable tracing in cqlsh and see how many sstables are being lifted to satisfy the query (are you repeatedly writing to the same partition [row_time]) over time?). Also watch for whether you're hitting a lot of tombstones (are you deleting lots of values in the same partition over time?). On Mon, Mar 23, 2015 at 4:01 AM, Dave Galbraith david92galbra...@gmail.com wrote: Duncan: I'm thinking it might be something like that. I'm also seeing just a ton of garbage collection on the box, could it be pulling rows for all 100k attrs for a given row_time into memory since only row_time is the partition key? Jens: I'm not using EBS (although I used to until I read up on how useless it is). I'm not sure what constitutes proper paging but my client has a pretty small amount of available memory so I'm doing pages of size 5k using the C++ Datastax driver. Thanks for the replies! -Dave On Mon, Mar 23, 2015 at 2:00 AM, Jens Rantil jens.ran...@tink.se
Deleted columns reappear after repair
Hey guys, We're having a very strange issue: deleted columns get resurrected when repair is run on a node. Info about the setup. Cassandra 2.0.13, multi datacenter with 12 nodes in one datacenter and 6 nodes in another one. Schema: cqlsh describe keyspace blackbook; CREATE KEYSPACE blackbook WITH replication = { 'class': 'NetworkTopologyStrategy', 'IAD': '3', 'ORD': '3' }; USE blackbook; CREATE TABLE bounces ( domainid text, address text, message text, timestamp bigint, PRIMARY KEY (domainid, address) ) WITH bloom_filter_fp_chance=0.10 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.10 AND gc_grace_seconds=864000 AND index_interval=128 AND read_repair_chance=0.00 AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND speculative_retry='99.0PERCENTILE' AND memtable_flush_period_in_ms=0 AND compaction={'class': 'LeveledCompactionStrategy'} AND compression={'sstable_compression': 'LZ4Compressor'}; We're using wide rows for the bounces table that can store hundreds of thousands of addresses for each domainid (in practice it's much less usually, but some rows may contain up to several million columns). All queries are done using LOCAL_QUORUM consistency. Sometimes bounces are deleted from the table using the following CQL3 statement: delete from bounces where domainid = 'domain.com' and address = ' al...@example.com'; But the thing is, after repair is run on any node that owns domain.com key, the column gets resurrected on all nodes as if the tombstone has disappeared. We checked this multiple times using cqlsh: issue a delete statement and verify that data is not returned; then run repair and the deleted data is returned again. Our gc_grace_seconds is of the default value and no nodes ever were down for anywhere close to 10 days, so it doesn't look like it's related. We also made sure all our servers are running ntpd so time synchronization should not be an issue as well. Have you guys ever seen anything like this / have any idea as to what may be causing this behavior? What could make tombstone disappear during repair operation? Thanks for your help. Let me know if I can provide more information. Roman