Re: can't start cqlsh on new Amazon node
Hi A bit more info on that I have one working setup with python-cql1.0.9-1 python-thrift 0.6.0-2~riptano1 cassandra1.0.8 The setup where cqlsh is not working has: python-cql1.0.10-1 python-thrift 0.6.0-2~riptano1 cassandra1.0.11 Maybe this will give someone a hint of what the problem may be and how to solve it. Thanks! *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Thu, Nov 8, 2012 at 9:38 AM, Tamar Fraenkel ta...@tok-media.com wrote: Nope... Same error: *cqlsh --debug --cql3 localhost 9160* Using CQL driver: module 'cql' from '/usr/lib/pymodules/python2.6/cql/__init__.pyc' Using thrift lib: module 'thrift' from '/usr/lib/pymodules/python2.6/thrift/__init__.pyc' Connection error: Invalid method name: 'set_cql_version' I believe it is some version mismatch. But this was DataStax AMI, I thought all should be coordinated, and I am not sure what to check for. Thanks, *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Thu, Nov 8, 2012 at 4:56 AM, Jason Wee peich...@gmail.com wrote: should it be --cql3 ? http://www.datastax.com/docs/1.1/dml/using_cql#start-cql3 On Wed, Nov 7, 2012 at 11:16 PM, Tamar Fraenkel ta...@tok-media.comwrote: Hi! I installed new cluster using DataStax AMI with --release 1.0.11, so I have cassandra 1.0.11 installed. Nodes have python-cql 1.0.10-1 and python2.6 Cluster works well, BUT when I try to connect to the cqlsh I get: *cqlsh --debug --cqlversion=2 localhost 9160* Using CQL driver: module 'cql' from '/usr/lib/pymodules/python2.6/cql/__init__.pyc' Using thrift lib: module 'thrift' from '/usr/lib/pymodules/python2.6/thrift/__init__.pyc' Connection error: Invalid method name: 'set_cql_version' * *This is the same if I chose cqlversion=3* *Any idea how to solve?* *Thanks,* Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 tokLogo.png
Storage limit for a particular user on Cassandra
Hi, Is there a way we can limit the data of a particular user on the Cassandra cluster? Say for example, I have three users namely, Jsmith, Elvis, Dilbert configured in my Cassandra deployment. And I wanted to limit the data usage for them as follows. Jsmith - 1 GB Elvis - 2 GB Dilbert - 500 MB Is there a way to achieve by fine tuning the configuration? If not, any workarounds? Thanks, ~Mallik.
Compact and Repair
Hi, We recently ran a major compaction across our cluster, which reduced the storage used by about 50%. This is fine, since we do a lot of updates to existing data, so that's the expected result. The day after, we ran a full repair -pr across the cluster, and when that finished, each storage node was at about the same size as before the major compaction. Why does that happen? What gets transferred to other nodes, and why does it suddenly take up a lot of space again? We haven't run repair -pr regularly, so is this just something that happens on the first weekly run, and can we expect a different result next week? Or does repair always cause the data to grow on each node? To me it just doesn't seem proportional? /Henrik
Re: Strange delay in query
On Nov 7, 2012, at 12:15 PM, André Cruz andre.c...@co.sapo.pt wrote: This error also happens on my application that uses pycassa, so I don't think this is the same bug. I have narrowed it down to a slice between two consecutive columns. Observe this behaviour using pycassa: DISCO_CASS.col_fam_nsrev.get(uuid.UUID('3cd88d97-ffde-44ca-8ae9-5336caaebc4e'), column_count=2, column_start=uuid.UUID('13957152-234b-11e2-92bc-e0db550199f4')).keys() DEBUG 2012-11-08 11:55:51,170 pycassa_library.pool:30 6849 139928791262976 Connection 52905488 (xxx:9160) was checked out from pool 51715344 DEBUG 2012-11-08 11:55:53,415 pycassa_library.pool:37 6849 139928791262976 Connection 52905488 (xxx:9160) was checked in to pool 51715344 [UUID('13957152-234b-11e2-92bc-e0db550199f4'), UUID('40b7ae4e-2449-11e2-8610-e0db550199f4')] A two column slice took more than 2s to return. If I request the next 2 column slice: DISCO_CASS.col_fam_nsrev.get(uuid.UUID('3cd88d97-ffde-44ca-8ae9-5336caaebc4e'), column_count=2, column_start=uuid.UUID('40b7ae4e-2449-11e2-8610-e0db550199f4')).keys() DEBUG 2012-11-08 11:57:32,750 pycassa_library.pool:30 6849 139928791262976 Connection 52904912 (xxx:9160) was checked out from pool 51715344 DEBUG 2012-11-08 11:57:32,774 pycassa_library.pool:37 6849 139928791262976 Connection 52904912 (xxx:9160) was checked in to pool 51715344 [UUID('40b7ae4e-2449-11e2-8610-e0db550199f4'), UUID('a364b028-2449-11e2-8882-e0db550199f4')] This takes 20msec... Is there a rational explanation for this different behaviour? Is there some threshold that I'm running into? Is there any way to obtain more debugging information about this problem? Thanks, André
Re: Compact and Repair
No, we're not using columns with TTL, and I performed a major compaction before the repair, so there shouldn't be vast amounts of tombstones moving around. And the increase happened during the repair, the nodes gained ~20-30GB each. /Henrik On Thu, Nov 8, 2012 at 12:40 PM, horschi hors...@gmail.com wrote: Hi, is it possible that your repair is overrepairing due to any of the issues discussed here: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/repair-compaction-and-tombstone-rows-td7583481.html? I've seen repair increasing the load on my cluster, but what you are describing sounds like a lot to me. Does this increase happen due to repair entirely? Or was the load maybe increasing gradually over the week and you just checked for the first time? cheers, Christian On Thu, Nov 8, 2012 at 11:55 AM, Henrik Schröder skro...@gmail.comwrote: Hi, We recently ran a major compaction across our cluster, which reduced the storage used by about 50%. This is fine, since we do a lot of updates to existing data, so that's the expected result. The day after, we ran a full repair -pr across the cluster, and when that finished, each storage node was at about the same size as before the major compaction. Why does that happen? What gets transferred to other nodes, and why does it suddenly take up a lot of space again? We haven't run repair -pr regularly, so is this just something that happens on the first weekly run, and can we expect a different result next week? Or does repair always cause the data to grow on each node? To me it just doesn't seem proportional? /Henrik
Re: Compact and Repair
Did you change the RF or had a node down since you repaired last time ? 2012/11/8 Henrik Schröder skro...@gmail.com No, we're not using columns with TTL, and I performed a major compaction before the repair, so there shouldn't be vast amounts of tombstones moving around. And the increase happened during the repair, the nodes gained ~20-30GB each. /Henrik On Thu, Nov 8, 2012 at 12:40 PM, horschi hors...@gmail.com wrote: Hi, is it possible that your repair is overrepairing due to any of the issues discussed here: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/repair-compaction-and-tombstone-rows-td7583481.html? I've seen repair increasing the load on my cluster, but what you are describing sounds like a lot to me. Does this increase happen due to repair entirely? Or was the load maybe increasing gradually over the week and you just checked for the first time? cheers, Christian On Thu, Nov 8, 2012 at 11:55 AM, Henrik Schröder skro...@gmail.comwrote: Hi, We recently ran a major compaction across our cluster, which reduced the storage used by about 50%. This is fine, since we do a lot of updates to existing data, so that's the expected result. The day after, we ran a full repair -pr across the cluster, and when that finished, each storage node was at about the same size as before the major compaction. Why does that happen? What gets transferred to other nodes, and why does it suddenly take up a lot of space again? We haven't run repair -pr regularly, so is this just something that happens on the first weekly run, and can we expect a different result next week? Or does repair always cause the data to grow on each node? To me it just doesn't seem proportional? /Henrik
Re: Compact and Repair
No, we haven't changed RF, but it's been a very long time since we repaired last, so we're guessing this is an effect of not running repair regularly, and that doing it regularly will fix it. It would just be nice to know. Also, running major compaction after the repair made the data size shrink back to what it was before, soe clearly a lot of junk data was sent over on that repair, most probably tombstones of some kind, as discussed in the other thread. /Henrik On Thu, Nov 8, 2012 at 1:53 PM, Alain RODRIGUEZ arodr...@gmail.com wrote: Did you change the RF or had a node down since you repaired last time ? 2012/11/8 Henrik Schröder skro...@gmail.com No, we're not using columns with TTL, and I performed a major compaction before the repair, so there shouldn't be vast amounts of tombstones moving around. And the increase happened during the repair, the nodes gained ~20-30GB each. /Henrik On Thu, Nov 8, 2012 at 12:40 PM, horschi hors...@gmail.com wrote: Hi, is it possible that your repair is overrepairing due to any of the issues discussed here: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/repair-compaction-and-tombstone-rows-td7583481.html? I've seen repair increasing the load on my cluster, but what you are describing sounds like a lot to me. Does this increase happen due to repair entirely? Or was the load maybe increasing gradually over the week and you just checked for the first time? cheers, Christian On Thu, Nov 8, 2012 at 11:55 AM, Henrik Schröder skro...@gmail.comwrote: Hi, We recently ran a major compaction across our cluster, which reduced the storage used by about 50%. This is fine, since we do a lot of updates to existing data, so that's the expected result. The day after, we ran a full repair -pr across the cluster, and when that finished, each storage node was at about the same size as before the major compaction. Why does that happen? What gets transferred to other nodes, and why does it suddenly take up a lot of space again? We haven't run repair -pr regularly, so is this just something that happens on the first weekly run, and can we expect a different result next week? Or does repair always cause the data to grow on each node? To me it just doesn't seem proportional? /Henrik
How to insert composite column in CQL3?
Hi there! I'm strugguling to figure out (for quite few hours now) how can I insert for example column with TimeUUID name and empy value in CQL3 in fictional table. And what's the table design? I'm interested in syntax (e.g. example). I'm trying to do something like Matt Dennis did here (*Cassandra NYC 2011: Matt Dennis - Data Modeling Workshop*): http://www.youtube.com/watch?v=OzBJrQZjge0t=9m45s Is that even possible in CQL3? Tnx. Lp, *Alan Ristić*
Re: Strange delay in query
What is the size of columns? Probably those two are huge. On Thu, Nov 8, 2012 at 4:01 AM, André Cruz andre.c...@co.sapo.pt wrote: On Nov 7, 2012, at 12:15 PM, André Cruz andre.c...@co.sapo.pt wrote: This error also happens on my application that uses pycassa, so I don't think this is the same bug. I have narrowed it down to a slice between two consecutive columns. Observe this behaviour using pycassa: DISCO_CASS.col_fam_nsrev.get(uuid.UUID('3cd88d97-ffde-44ca-8ae9-5336caaebc4e'), column_count=2, column_start=uuid.UUID('13957152-234b-11e2-92bc-e0db550199f4')).keys() DEBUG 2012-11-08 11:55:51,170 pycassa_library.pool:30 6849 139928791262976 Connection 52905488 (xxx:9160) was checked out from pool 51715344 DEBUG 2012-11-08 11:55:53,415 pycassa_library.pool:37 6849 139928791262976 Connection 52905488 (xxx:9160) was checked in to pool 51715344 [UUID('13957152-234b-11e2-92bc-e0db550199f4'), UUID('40b7ae4e-2449-11e2-8610-e0db550199f4')] A two column slice took more than 2s to return. If I request the next 2 column slice: DISCO_CASS.col_fam_nsrev.get(uuid.UUID('3cd88d97-ffde-44ca-8ae9-5336caaebc4e'), column_count=2, column_start=uuid.UUID('40b7ae4e-2449-11e2-8610-e0db550199f4')).keys() DEBUG 2012-11-08 11:57:32,750 pycassa_library.pool:30 6849 139928791262976 Connection 52904912 (xxx:9160) was checked out from pool 51715344 DEBUG 2012-11-08 11:57:32,774 pycassa_library.pool:37 6849 139928791262976 Connection 52904912 (xxx:9160) was checked in to pool 51715344 [UUID('40b7ae4e-2449-11e2-8610-e0db550199f4'), UUID('a364b028-2449-11e2-8882-e0db550199f4')] This takes 20msec... Is there a rational explanation for this different behaviour? Is there some threshold that I'm running into? Is there any way to obtain more debugging information about this problem? Thanks, André
leveled compaction and tombstoned data
we are having the problem where we have huge SSTABLEs with tombstoned data in them that is not being compacted soon enough (because size tiered compaction requires, by default, 4 like sized SSTABLEs). this is using more disk space than we anticipated. we are very write heavy compared to reads, and we delete the data after N number of days (depends on the column family, but N is around 7 days) my question is would leveled compaction help to get rid of the tombstoned data faster than size tiered, and therefore reduce the disk space usage? thx
Re: leveled compaction and tombstoned data
Dne 8.11.2012 19:12, B. Todd Burruss napsal(a): my question is would leveled compaction help to get rid of the tombstoned data faster than size tiered, and therefore reduce the disk space usage? leveled compaction will kill your performance. get patch from jira for maximum sstable size per CF and force cassandra to make smaller tables, they expire faster.
Re: How to insert composite column in CQL3?
Ok, this article answered all the confusion in my head: http://www.datastax.com/dev/blog/thrift-to-cql3 It's a must read for noobs (like me). It perfectly explains mappings and diffs between internals and CQL3(abstractions). First read this and THEN go study all the resources out there ;) Lp, Alan Ristić Lp, *Alan Ristić* *m*: 040 423 688 2012/11/8 Alan Ristić alan.ris...@gmail.com Hi there! I'm strugguling to figure out (for quite few hours now) how can I insert for example column with TimeUUID name and empy value in CQL3 in fictional table. And what's the table design? I'm interested in syntax (e.g. example). I'm trying to do something like Matt Dennis did here (*Cassandra NYC 2011: Matt Dennis - Data Modeling Workshop*): http://www.youtube.com/watch?v=OzBJrQZjge0t=9m45s Is that even possible in CQL3? Tnx. Lp, *Alan Ristić*
Kundera 2.2 released
Hi All, We are happy to announce release of Kundera 2.2. Kundera is a JPA 2.0 based, object-datastore mapping library for NoSQL datastores. The idea behind Kundera is to make working with NoSQL Databases drop-dead simple and fun. It currently supports Cassandra, HBase, MongoDB and relational databases. Major Changes in this release: --- * Geospatial Persistence and Queries for MongoDB * Composite keys support for Cassandra and MongoDB * Cassandra 1.1.6 migration * Support for enum data type * Named and Native queries support for REST based access Github Issues Fixes (https://github.com/impetus-opensource/Kundera/issues): -- Issue 136 - JPQL queries without WHERE clause or parameters fail Issue 135 - MongoDB: enable WriteConcern, Safe mode and other properties on operation level. Issue 133 - Externalize the database connection configuration Issue 132 - problem in loading entity metadata when giving class name in class tag of persistence.xml Issue 130 - Row not fully deleted from cassandra on em.remove(obj) - then cannot reinsert row with same key We have revamped our wiki, so you might want to have a look at it here: https://github.com/impetus-opensource/Kundera/wiki To download, use or contribute to Kundera, visit: http://github.com/impetus-opensource/Kundera Latest released tag version is 2.2. Kundera maven libraries are now available at: https://oss.sonatype.org/content/repositories/releases/com/impetus Sample codes and examples for using Kundera can be found here: http://github.com/impetus-opensource/Kundera-Examples and https://github.com/impetus-opensource/Kundera/tree/trunk/kundera-tests Thank you all for your contributions! Regards, Kundera Team. Neustar VP and Impetus CEO to present on 'Innovative information services powered by Cloud and Big Data technologies'at Cloud Expo - Santa Clara, Nov 6th. http://www.impetus.com/events#2. Check out Impetus contribution to build Luminar - a new business unit at Entravision. http://lf1.me/MS/ NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Re: leveled compaction and tombstoned data
we are running Datastax enterprise and cannot patch it. how bad is kill performance? if it is so bad, why is it an option? On Thu, Nov 8, 2012 at 10:17 AM, Radim Kolar h...@filez.com wrote: Dne 8.11.2012 19:12, B. Todd Burruss napsal(a): my question is would leveled compaction help to get rid of the tombstoned data faster than size tiered, and therefore reduce the disk space usage? leveled compaction will kill your performance. get patch from jira for maximum sstable size per CF and force cassandra to make smaller tables, they expire faster.
Re: leveled compaction and tombstoned data
kill performance is relative. Leveled Compaction basically costs 2x disk IO. Look at iostat, etc and see if you have the headroom. There are also ways to bring up a test node and just run Level Compaction on that. Wish I had a URL handy, but hopefully someone else can find it. Also, if you're not using compression, check it out. On Thu, Nov 8, 2012 at 11:20 AM, B. Todd Burruss bto...@gmail.com wrote: we are running Datastax enterprise and cannot patch it. how bad is kill performance? if it is so bad, why is it an option? On Thu, Nov 8, 2012 at 10:17 AM, Radim Kolar h...@filez.com wrote: Dne 8.11.2012 19:12, B. Todd Burruss napsal(a): my question is would leveled compaction help to get rid of the tombstoned data faster than size tiered, and therefore reduce the disk space usage? leveled compaction will kill your performance. get patch from jira for maximum sstable size per CF and force cassandra to make smaller tables, they expire faster. -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero
Re: leveled compaction and tombstoned data
LCS works well in specific circumstances, this blog post gives some good considerations: http://www.datastax.com/dev/blog/when-to-use-leveled-compaction On Nov 8, 2012, at 1:33 PM, Aaron Turner synfina...@gmail.com wrote: kill performance is relative. Leveled Compaction basically costs 2x disk IO. Look at iostat, etc and see if you have the headroom. There are also ways to bring up a test node and just run Level Compaction on that. Wish I had a URL handy, but hopefully someone else can find it. Also, if you're not using compression, check it out. On Thu, Nov 8, 2012 at 11:20 AM, B. Todd Burruss bto...@gmail.com wrote: we are running Datastax enterprise and cannot patch it. how bad is kill performance? if it is so bad, why is it an option? On Thu, Nov 8, 2012 at 10:17 AM, Radim Kolar h...@filez.com wrote: Dne 8.11.2012 19:12, B. Todd Burruss napsal(a): my question is would leveled compaction help to get rid of the tombstoned data faster than size tiered, and therefore reduce the disk space usage? leveled compaction will kill your performance. get patch from jira for maximum sstable size per CF and force cassandra to make smaller tables, they expire faster. -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero
Re: leveled compaction and tombstoned data
On Thu, Nov 8, 2012 at 1:33 PM, Aaron Turner synfina...@gmail.com wrote: There are also ways to bring up a test node and just run Level Compaction on that. Wish I had a URL handy, but hopefully someone else can find it. This rather handsome fellow wrote a blog about it: http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-live-traffic-sampling -Brandon
Re: leveled compaction and tombstoned data
http://www.datastax.com/docs/1.1/operations/tuning#testing-compaction-and-compression Write Survey mode. After you have it up and running you can modify the column family mbean to use LeveledCompactionStrategy on that node to see how your hardware/load fares with LCS. On Thu, Nov 8, 2012 at 11:33 AM, Aaron Turner synfina...@gmail.com wrote: kill performance is relative. Leveled Compaction basically costs 2x disk IO. Look at iostat, etc and see if you have the headroom. There are also ways to bring up a test node and just run Level Compaction on that. Wish I had a URL handy, but hopefully someone else can find it. Also, if you're not using compression, check it out. On Thu, Nov 8, 2012 at 11:20 AM, B. Todd Burruss bto...@gmail.com wrote: we are running Datastax enterprise and cannot patch it. how bad is kill performance? if it is so bad, why is it an option? On Thu, Nov 8, 2012 at 10:17 AM, Radim Kolar h...@filez.com wrote: Dne 8.11.2012 19:12, B. Todd Burruss napsal(a): my question is would leveled compaction help to get rid of the tombstoned data faster than size tiered, and therefore reduce the disk space usage? leveled compaction will kill your performance. get patch from jira for maximum sstable size per CF and force cassandra to make smaller tables, they expire faster. -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero -- Ben Coverston DataStax -- The Apache Cassandra Company
Re: leveled compaction and tombstoned data
Also to answer your question, LCS is well suited to workloads where overwrites and tombstones come into play. The tombstones are _much_ more likely to be merged with LCS than STCS. I would be careful with the patch that was referred to above, it hasn't been reviewed, and from a glance it appears that it will cause an infinite compaction loop if you get more than 4 SSTables at max size. On Thu, Nov 8, 2012 at 11:41 AM, Brandon Williams dri...@gmail.com wrote: On Thu, Nov 8, 2012 at 1:33 PM, Aaron Turner synfina...@gmail.com wrote: There are also ways to bring up a test node and just run Level Compaction on that. Wish I had a URL handy, but hopefully someone else can find it. This rather handsome fellow wrote a blog about it: http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-live-traffic-sampling -Brandon -- Ben Coverston DataStax -- The Apache Cassandra Company
Re: leveled compaction and tombstoned data
thanks for the links! i had forgotten about live sampling On Thu, Nov 8, 2012 at 11:41 AM, Brandon Williams dri...@gmail.com wrote: On Thu, Nov 8, 2012 at 1:33 PM, Aaron Turner synfina...@gmail.com wrote: There are also ways to bring up a test node and just run Level Compaction on that. Wish I had a URL handy, but hopefully someone else can find it. This rather handsome fellow wrote a blog about it: http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-live-traffic-sampling -Brandon
Re: leveled compaction and tombstoned data
@ben, thx, we will be deploying 2.2.1 of DSE soon and will try to setup a traffic sampling node so we can test leveled compaction. we essentially keep a rolling window of data written once. it is written, then after N days it is deleted, so it seems that leveled compaction should help On Thu, Nov 8, 2012 at 11:53 AM, B. Todd Burruss bto...@gmail.com wrote: thanks for the links! i had forgotten about live sampling On Thu, Nov 8, 2012 at 11:41 AM, Brandon Williams dri...@gmail.com wrote: On Thu, Nov 8, 2012 at 1:33 PM, Aaron Turner synfina...@gmail.com wrote: There are also ways to bring up a test node and just run Level Compaction on that. Wish I had a URL handy, but hopefully someone else can find it. This rather handsome fellow wrote a blog about it: http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-live-traffic-sampling -Brandon
Re: Hinted Handoff runs every ten minutes
Is there a ticket open for this for 1.1.6? We also noticed this after upgrading from 1.1.3 to 1.1.6. Every node runs a 0 row hinted handoff every 10 minutes. N-1 nodes hint to the same node, while that node hints to another node. On Tue, Oct 30, 2012 at 1:35 PM, Vegard Berget p...@fantasista.no wrote: Hi, I have the exact same problem with 1.1.6. HintsColumnFamily consists of one row (Rowkey 00, nothing more). The problem started after upgrading from 1.1.4 to 1.1.6. Every ten minutes HintedHandoffManager starts and finishes after sending 0 rows. .vegard, - Original Message - From: user@cassandra.apache.org To: user@cassandra.apache.org Cc: Sent: Mon, 29 Oct 2012 23:45:30 +0100 Subject: Re: Hinted Handoff runs every ten minutes Dne 29.10.2012 23:24, Stephen Pierce napsal(a): I'm running 1.1.5; the bug says it's fixed in 1.0.9/1.1.0. How can I check to see why it keeps running HintedHandoff? you have tombstone is system.HintsColumnFamily use list command in cassandra-cli to check -- Mike Heffner m...@librato.com Librato, Inc.
Multiple keyspaces vs Multiple CFs
Is it better to have 10 Keyspaces with 10 CF in each keyspace. or 100 keyspaces with 1 CF each. I am talking in terms of memory footprint. Also I would be interested to know how much better one is over other. Thanks, Sankalp
Re: Multiple keyspaces vs Multiple CFs
it is better to have one keyspace unless you need to replicate the keyspaces differently. The main reason for this is that changing keyspaces requires an RPC operation. Having 10 keyspaces would mean having 10 connection pools. On Thu, Nov 8, 2012 at 4:59 PM, sankalp kohli kohlisank...@gmail.com wrote: Is it better to have 10 Keyspaces with 10 CF in each keyspace. or 100 keyspaces with 1 CF each. I am talking in terms of memory footprint. Also I would be interested to know how much better one is over other. Thanks, Sankalp
Re: Multiple keyspaces vs Multiple CFs
Which connection pool are you talking about? On Thu, Nov 8, 2012 at 2:19 PM, Edward Capriolo edlinuxg...@gmail.comwrote: it is better to have one keyspace unless you need to replicate the keyspaces differently. The main reason for this is that changing keyspaces requires an RPC operation. Having 10 keyspaces would mean having 10 connection pools. On Thu, Nov 8, 2012 at 4:59 PM, sankalp kohli kohlisank...@gmail.com wrote: Is it better to have 10 Keyspaces with 10 CF in each keyspace. or 100 keyspaces with 1 CF each. I am talking in terms of memory footprint. Also I would be interested to know how much better one is over other. Thanks, Sankalp
Read during digest mismatch
Hi, Lets say I am reading with consistency TWO and my replication is 3. The read is eligible for global read repair. It will send a request to get data from one node and a digest request to two. If there is a digest mismatch, what I am reading from the code looks like it will get the data from all three nodes and do a resolve of the data before returning to the client. Is it correct or I am readind the code wrong? Also if this is correct, look like if the third node is in other DC, the read will slow down even when the consistency was TWO? Thanks, Sankalp
Re: Strange delay in query
Can it be that you have tons and tons of tombstoned columns in the middle of these two? I've seen plenty of performance issues with wide rows littered with column tombstones (you could check with dumping the sstables...) Just a thought... Josep M. On Thu, Nov 8, 2012 at 12:23 PM, André Cruz andre.c...@co.sapo.pt wrote: These are the two columns in question: = (super_column=13957152-234b-11e2-92bc-e0db550199f4, (column=attributes, value=, timestamp=1351681613263657) (column=blocks, value=A4edo5MhHvojv3Ihx_JkFMsF3ypthtBvAZkoRHsjulw06pez86OHch3K3OpmISnDjHODPoCf69bKcuAZSJj-4Q, timestamp=1351681613263657) (column=hash, value=8_p2QaeRaX_QwJbUWQ07ZqlNHei7ixu0MHxgu9oennfYOGfyH6EsEe_LYO8V8EC_1NPL44Gx8B7UhYV9VSb7Lg, timestamp=1351681613263657) (column=icon, value=image_jpg, timestamp=1351681613263657) (column=is_deleted, value=true, timestamp=1351681613263657) (column=is_dir, value=false, timestamp=1351681613263657) (column=mime_type, value=image/jpeg, timestamp=1351681613263657) (column=mtime, value=1351646803, timestamp=1351681613263657) (column=name, value=/Mobile Photos/Photo 2012-10-28 17_13_50.jpeg, timestamp=1351681613263657) (column=revision, value=13957152-234b-11e2-92bc-e0db550199f4, timestamp=1351681613263657) (column=size, value=1379001, timestamp=1351681613263657) (column=thumb_exists, value=true, timestamp=1351681613263657)) = (super_column=40b7ae4e-2449-11e2-8610-e0db550199f4, (column=attributes, value={posix: 420}, timestamp=1351790781154800) (column=blocks, value=9UCDkHNb8-8LuKr2bv9PjKcWCT0v7FCZa0ebNSflES4-o7QD6eYschVaweCKSbR29Dq2IeGl_Cu7BVnYJYphTQ, timestamp=1351790781154800) (column=hash, value=kao2EV8jw_wN4EBoMkCXZWCwg3qQ0X6m9_X9JIGkEkiGKJE_JeKgkdoTAkAefXgGtyhChuhWPlWMxl_tX7VZUw, timestamp=1351790781154800) (column=icon, value=text_txt, timestamp=1351790781154800) (column=is_dir, value=false, timestamp=1351790781154800) (column=mime_type, value=text/plain, timestamp=1351790781154800) (column=mtime, value=1351378576, timestamp=1351790781154800) (column=name, value=/Documents/VIMDocument.txt, timestamp=1351790781154800) (column=revision, value=40b7ae4e-2449-11e2-8610-e0db550199f4, timestamp=1351790781154800) (column=size, value=13, timestamp=1351790781154800) (column=thumb_exists, value=false, timestamp=1351790781154800)) I don't think their size is an issue here. André On Nov 8, 2012, at 6:04 PM, Andrey Ilinykh ailin...@gmail.com wrote: What is the size of columns? Probably those two are huge. On Thu, Nov 8, 2012 at 4:01 AM, André Cruz andre.c...@co.sapo.pt wrote: On Nov 7, 2012, at 12:15 PM, André Cruz andre.c...@co.sapo.pt wrote: This error also happens on my application that uses pycassa, so I don't think this is the same bug. I have narrowed it down to a slice between two consecutive columns. Observe this behaviour using pycassa: DISCO_CASS.col_fam_nsrev.get(uuid.UUID('3cd88d97-ffde-44ca-8ae9-5336caaebc4e'), column_count=2, column_start=uuid.UUID('13957152-234b-11e2-92bc-e0db550199f4')).keys() DEBUG 2012-11-08 11:55:51,170 pycassa_library.pool:30 6849 139928791262976 Connection 52905488 (xxx:9160) was checked out from pool 51715344 DEBUG 2012-11-08 11:55:53,415 pycassa_library.pool:37 6849 139928791262976 Connection 52905488 (xxx:9160) was checked in to pool 51715344 [UUID('13957152-234b-11e2-92bc-e0db550199f4'), UUID('40b7ae4e-2449-11e2-8610-e0db550199f4')] A two column slice took more than 2s to return. If I request the next 2 column slice: DISCO_CASS.col_fam_nsrev.get(uuid.UUID('3cd88d97-ffde-44ca-8ae9-5336caaebc4e'), column_count=2, column_start=uuid.UUID('40b7ae4e-2449-11e2-8610-e0db550199f4')).keys() DEBUG 2012-11-08 11:57:32,750 pycassa_library.pool:30 6849 139928791262976 Connection 52904912 (xxx:9160) was checked out from pool 51715344 DEBUG 2012-11-08 11:57:32,774 pycassa_library.pool:37 6849 139928791262976 Connection 52904912 (xxx:9160) was checked in to pool 51715344 [UUID('40b7ae4e-2449-11e2-8610-e0db550199f4'), UUID('a364b028-2449-11e2-8882-e0db550199f4')] This takes 20msec... Is there a rational explanation for this different behaviour? Is there some threshold that I'm running into? Is there any way to obtain more debugging information about this problem? Thanks, André
Re: Multiple keyspaces vs Multiple CFs
Any connection pool. Imagine if you have 10 column families in 10 keyspaces. You pull a connection off the pool and the odds are 1 in 10 of it being connected to the keyspace you want. So 9 out of 10 times you have to have a network round trip just to change the keyspace, or you have to build a keyspace aware connection pool. Edward On Thu, Nov 8, 2012 at 5:36 PM, sankalp kohli kohlisank...@gmail.com wrote: Which connection pool are you talking about? On Thu, Nov 8, 2012 at 2:19 PM, Edward Capriolo edlinuxg...@gmail.com wrote: it is better to have one keyspace unless you need to replicate the keyspaces differently. The main reason for this is that changing keyspaces requires an RPC operation. Having 10 keyspaces would mean having 10 connection pools. On Thu, Nov 8, 2012 at 4:59 PM, sankalp kohli kohlisank...@gmail.com wrote: Is it better to have 10 Keyspaces with 10 CF in each keyspace. or 100 keyspaces with 1 CF each. I am talking in terms of memory footprint. Also I would be interested to know how much better one is over other. Thanks, Sankalp
Re: Multiple keyspaces vs Multiple CFs
In the old days the API looked like this. client.insert(Keyspace1, key_user_id, new ColumnPath(Standard1, null, name.getBytes(UTF-8)), Chris Goffinet.getBytes(UTF-8), timestamp, ConsistencyLevel.ONE); but now it works like this /pay attention to this below -/ client.set_keyspace(keyspace1); /pay attention to this above -/ client.insert( key_user_id, new ColumnPath(Standard1, null, name.getBytes(UTF-8)), Chris Goffinet.getBytes(UTF-8), timestamp, ConsistencyLevel.ONE); So each time you switch keyspaces you make a network round trip. On Thu, Nov 8, 2012 at 6:17 PM, sankalp kohli kohlisank...@gmail.com wrote: I am a bit confused. One connection pool I know is the one which MessageService has to other nodes. Then there will be incoming connections via thrift from clients. How are they affected by multiple keyspaces? On Thu, Nov 8, 2012 at 3:14 PM, Edward Capriolo edlinuxg...@gmail.com wrote: Any connection pool. Imagine if you have 10 column families in 10 keyspaces. You pull a connection off the pool and the odds are 1 in 10 of it being connected to the keyspace you want. So 9 out of 10 times you have to have a network round trip just to change the keyspace, or you have to build a keyspace aware connection pool. Edward On Thu, Nov 8, 2012 at 5:36 PM, sankalp kohli kohlisank...@gmail.com wrote: Which connection pool are you talking about? On Thu, Nov 8, 2012 at 2:19 PM, Edward Capriolo edlinuxg...@gmail.com wrote: it is better to have one keyspace unless you need to replicate the keyspaces differently. The main reason for this is that changing keyspaces requires an RPC operation. Having 10 keyspaces would mean having 10 connection pools. On Thu, Nov 8, 2012 at 4:59 PM, sankalp kohli kohlisank...@gmail.com wrote: Is it better to have 10 Keyspaces with 10 CF in each keyspace. or 100 keyspaces with 1 CF each. I am talking in terms of memory footprint. Also I would be interested to know how much better one is over other. Thanks, Sankalp
Re: Multiple keyspaces vs Multiple CFs
I think this code is from the thrift part. I use hector. In hector, I can create multiple keyspace objects for each keyspace and use them when I want to talk to that keyspace. Why will it need to do a round trip to the server for each switch. On Thu, Nov 8, 2012 at 3:28 PM, Edward Capriolo edlinuxg...@gmail.comwrote: In the old days the API looked like this. client.insert(Keyspace1, key_user_id, new ColumnPath(Standard1, null, name.getBytes(UTF-8)), Chris Goffinet.getBytes(UTF-8), timestamp, ConsistencyLevel.ONE); but now it works like this /pay attention to this below -/ client.set_keyspace(keyspace1); /pay attention to this above -/ client.insert( key_user_id, new ColumnPath(Standard1, null, name.getBytes(UTF-8)), Chris Goffinet.getBytes(UTF-8), timestamp, ConsistencyLevel.ONE); So each time you switch keyspaces you make a network round trip. On Thu, Nov 8, 2012 at 6:17 PM, sankalp kohli kohlisank...@gmail.com wrote: I am a bit confused. One connection pool I know is the one which MessageService has to other nodes. Then there will be incoming connections via thrift from clients. How are they affected by multiple keyspaces? On Thu, Nov 8, 2012 at 3:14 PM, Edward Capriolo edlinuxg...@gmail.com wrote: Any connection pool. Imagine if you have 10 column families in 10 keyspaces. You pull a connection off the pool and the odds are 1 in 10 of it being connected to the keyspace you want. So 9 out of 10 times you have to have a network round trip just to change the keyspace, or you have to build a keyspace aware connection pool. Edward On Thu, Nov 8, 2012 at 5:36 PM, sankalp kohli kohlisank...@gmail.com wrote: Which connection pool are you talking about? On Thu, Nov 8, 2012 at 2:19 PM, Edward Capriolo edlinuxg...@gmail.com wrote: it is better to have one keyspace unless you need to replicate the keyspaces differently. The main reason for this is that changing keyspaces requires an RPC operation. Having 10 keyspaces would mean having 10 connection pools. On Thu, Nov 8, 2012 at 4:59 PM, sankalp kohli kohlisank...@gmail.com wrote: Is it better to have 10 Keyspaces with 10 CF in each keyspace. or 100 keyspaces with 1 CF each. I am talking in terms of memory footprint. Also I would be interested to know how much better one is over other. Thanks, Sankalp
Re: Loading data on-demand in Cassandra
Pierre Chalamet pierre at chalamet.net writes: Hi,You do not need to have 700 Go of data in RAM. Cassandra is able to store on disks and query from there if data is not cached in memory. Caches are maintained by C* by itself but you still have to some configuration.Supposing you want to store around 800 Go and with a RF=3, you will need at least 6 servers if you want to store all data of your db (keeping max 400 Go per server) : 800x3/400=6.There is no native implementation of trigger in C*. Anyway, there is an extension bringing this feature: https://github.com/hmsonline/cassandra-triggers. This should allow you to be notified of mutations (ie: not query). Some peoples on this ML are involved in this, maybe they could help on this.Cheers,- Pierre From: Oliver Plohmann oliver at objectscape.org Date: Sun, 12 Aug 2012 21:24:43 +0200 To: user at cassandra.apache.org ReplyTo: user at cassandra.apache.org Subject: Loading data on-demand in Cassandra Hello, I'm looking a bit into Cassandra to see whether it would be something to go with for my company. I searched through the Internet, looked through the FAQs, etc. but there are still some few open questions. Hope I don't bother anybody with the usual beginner questions ... Is there a way to do load-on-demand of data in Cassandra? For the time being, we cannot afford to built up a cluster that holds our 700 GB SQL-Database in RAM. So we need to be able to load data on-demand from our relational database. Can this be done in Cassandra? Then there also needs to be a way to unload data in order to reclaim RAM space. Would be nice if it were possible to register for an asynchronous notification in case some value was changed. Can this be done? Thanks for any answers. Regards, Oliver I would consider looking into distributed caching technology (ehcache, gemfire)
unsubscribe
smime.p7s Description: S/MIME cryptographic signature
Re: get_range_slice gets no rowcache support?
I did overlook something. get_range_slice will invoke cfs.getRawCachedRow instead of cfs.getThroughCache. Hence, no row will be cached if it's not present in the row cache. Well, this puzzles me further as to that how the range of rows is expected to get stored into the row cache in the first place. Would someone please clarify it for me? Thanks in advance. On Thu, Nov 8, 2012 at 3:23 PM, Manu Zhang owenzhang1...@gmail.com wrote: I've asked this question before. And after reading the source codes, I find that get_range_slice doesn't query rowcache before reading from Memtable and SSTable. I just want to make sure whether I've overlooked something. If my observation is correct, what's the consideration here?
Re: Multiple keyspaces vs Multiple CFs
It is not as bad with hector, but still each Keyspace object is another socket open to Cassandra. If you have 500 webservers and 10 keyspaces. Instead of having 5000 connections you now have 5000. On Thu, Nov 8, 2012 at 6:35 PM, sankalp kohli kohlisank...@gmail.com wrote: I think this code is from the thrift part. I use hector. In hector, I can create multiple keyspace objects for each keyspace and use them when I want to talk to that keyspace. Why will it need to do a round trip to the server for each switch. On Thu, Nov 8, 2012 at 3:28 PM, Edward Capriolo edlinuxg...@gmail.com wrote: In the old days the API looked like this. client.insert(Keyspace1, key_user_id, new ColumnPath(Standard1, null, name.getBytes(UTF-8)), Chris Goffinet.getBytes(UTF-8), timestamp, ConsistencyLevel.ONE); but now it works like this /pay attention to this below -/ client.set_keyspace(keyspace1); /pay attention to this above -/ client.insert( key_user_id, new ColumnPath(Standard1, null, name.getBytes(UTF-8)), Chris Goffinet.getBytes(UTF-8), timestamp, ConsistencyLevel.ONE); So each time you switch keyspaces you make a network round trip. On Thu, Nov 8, 2012 at 6:17 PM, sankalp kohli kohlisank...@gmail.com wrote: I am a bit confused. One connection pool I know is the one which MessageService has to other nodes. Then there will be incoming connections via thrift from clients. How are they affected by multiple keyspaces? On Thu, Nov 8, 2012 at 3:14 PM, Edward Capriolo edlinuxg...@gmail.com wrote: Any connection pool. Imagine if you have 10 column families in 10 keyspaces. You pull a connection off the pool and the odds are 1 in 10 of it being connected to the keyspace you want. So 9 out of 10 times you have to have a network round trip just to change the keyspace, or you have to build a keyspace aware connection pool. Edward On Thu, Nov 8, 2012 at 5:36 PM, sankalp kohli kohlisank...@gmail.com wrote: Which connection pool are you talking about? On Thu, Nov 8, 2012 at 2:19 PM, Edward Capriolo edlinuxg...@gmail.com wrote: it is better to have one keyspace unless you need to replicate the keyspaces differently. The main reason for this is that changing keyspaces requires an RPC operation. Having 10 keyspaces would mean having 10 connection pools. On Thu, Nov 8, 2012 at 4:59 PM, sankalp kohli kohlisank...@gmail.com wrote: Is it better to have 10 Keyspaces with 10 CF in each keyspace. or 100 keyspaces with 1 CF each. I am talking in terms of memory footprint. Also I would be interested to know how much better one is over other. Thanks, Sankalp
Indexing Data in Cassandra with Elastic Search
For those looking to index data in Cassandra with Elastic Search, here is what we decided to do: http://brianoneill.blogspot.com/2012/11/big-data-quadfecta-cassandra-storm.html -brian -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://brianoneill.blogspot.com/ twitter: @boneill42
read request distribution
Hi All, I am doing a benchmark on a Cassandra. I have a three node cluster with RF=3. I generated 6M rows with sequence number from 1 to 6m, so the rows should be evenly distributed among the three nodes disregarding the replicates. I am doing a benchmark with read only requests, I generate read request for randomly generated keys from 1 to 6M. Oddly, nodetool cfstats, reports that one node has only half the requests as the other one and the third node sits in the middle. So the ratio is like 2:3:4. The node with the most read requests actually has the smallest latency and the one with the least read requests reports the largest latency. The difference is pretty big, the fastest is almost double the slowest. All three nodes have the exactly the same hardware and the data size on each node are the same since the RF is three and all of them have the complete data. I am using Hector as client and the random read request are in millions. I can't think of a reasonable explanation. Can someone please shed some lights? Thanks. -Wei
Re: composite column validation_class question
Any thoughts? Thanks. -Wei From: Wei Zhu wz1...@yahoo.com To: Cassandr usergroup user@cassandra.apache.org Sent: Wednesday, November 7, 2012 12:47 PM Subject: composite column validation_class question Hi All, I am trying to design my schema using composite column. One thing I am a bit confused is how to define validation_class for the composite column, or is there a way to define it? for the composite column, I might insert different value based on the column name, for example I will insert date for column created: set user[1]['7:1:100:created'] = 1351728000; and insert String for description set user[1]['7:1:100:desc'] = my description; I don't see a way to define validation_class for composite column. Am I right? Thanks. -Wei