Re: CompactionExecutor holds 8000+ SSTableReader 6G+ memory
These two fields: CompressedRandomAccessReader.buffer CompressedRandomAccessReader.compressed in the queue SSTableReader.dfile.pool consumed those memory. I think the SSTableReader.dfile is the cache of the SSTable file. On Sat, Jun 29, 2013 at 1:09 PM, aaron morton aa...@thelastpickle.comwrote: Lots of memory are consumed by the SSTableReader's cache The file cache is managed by the OS. However the SSTableReader will have bloom filters and compression meta data, both off heap in 1.2. The Key and Row caches are global so not associated with any one SStable. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 28/06/2013, at 6:23 PM, sulong sulong1...@gmail.com wrote: Total 100G data per node. On Fri, Jun 28, 2013 at 2:14 PM, sulong sulong1...@gmail.com wrote: aaron, thanks for your reply. Yes, I do use the Leveled compactions strategy, and the SSTable size is 10M. If it happens again, I will try to enlarge the sstable size. I just wonder why cassandra doesn't limit the SSTableReader's total memory usage when compacting. Lots of memory are consumed by the SSTableReader's cache. Why not clear these cache first at the beginning of compaction? On Fri, Jun 28, 2013 at 1:14 PM, aaron morton aa...@thelastpickle.comwrote: Are you running the Levelled compactions strategy ? If so what is the max SSTable size and what is the total data per node? If you are running it try using a larger SSTable size like 32MB Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 27/06/2013, at 2:02 PM, sulong sulong1...@gmail.com wrote: According to the OpsCenter records, yes, the compaction was running then, 8.5mb /s On Thu, Jun 27, 2013 at 9:54 AM, sulong sulong1...@gmail.com wrote: version: 1.2.2 cluster read requests 800/s, write request 22/s Sorrry, I don't know whether the compaction was running then. On Thu, Jun 27, 2013 at 1:02 AM, Robert Coli rc...@eventbrite.comwrote: On Tue, Jun 25, 2013 at 10:13 PM, sulong sulong1...@gmail.com wrote: I have 4 nodes cassandra cluster. Every node has 32G memory, and the cassandra jvm uses 8G. The cluster is suffering from gc. Looks like CompactionExecutor thread holds too many SSTableReader. See the attachement. What version of Cassandra? What workload? Is compaction actually running? =Rob
Re: How to do a CAS UPDATE on single column CF?
You're right, there is currently no way to do this since 1) insert can't have a IF currently and 2) update can't update such table. We'll fix that: https://issues.apache.org/jira/browse/CASSANDRA-5715 -- Sylvain On Sat, Jun 29, 2013 at 9:51 PM, Blair Zajac bl...@orcaware.com wrote: On 6/24/13 8:23 PM, Blair Zajac wrote: How does one do an atomic update in a column family with a single column? I have a this CF CREATE TABLE schema_migrations ( version TEXT PRIMARY KEY, ) WITH COMPACTION = {'class': 'LeveledCompactionStrategy'}; Anyone? Should I raise this on the developer mailing list or open a ticket? Blair
Re: Cassandra as storage for cache data
Hello, thanks to all for your answers and comments. What we've done: - increased Java heap memory up to 6 Gb - changed replication factor to 1 - set durable_writes to false - set memtable_total_space_in_mb to 5000 - set commitlog_total_space_in_mb to 6000 If I understand correctly the last parameter has no matter since we set durable_writes to false. Now the overall performance is much better but still not outstanding. We continue observing quite frequent compactions on every node. According to OpsCenter's graphs Java Heap never grows above 3.5 Gb. So there is enough memory to keep memtables. Why they still get flushed to disk triggering compactions? -- Best regards, Dmitry Olshansky
C* 1.2.5 AssertionError in ColumnSerializer:40
Hi, using C* 1.2.5 I just found a weird AssertionError in our logfiles: ... INFO [OptionalTasks:1] 2013-07-01 09:15:43,608 MeteredFlusher.java (line 58) flushing high-traffic column family CFS(Keyspace='Monitoring', ColumnFamily='cfDateOrderedMessages') (estimated 5242880 bytes) INFO [OptionalTasks:1] 2013-07-01 09:15:43,609 ColumnFamilyStore.java (line 630) Enqueuing flush of Memtable-cfDateOrderedMessages@2147245119(4616888/5242880 serialized/live bytes, 23714 ops) INFO [FlushWriter:9] 2013-07-01 09:15:43,610 Memtable.java (line 461) Writing Memtable-cfDateOrderedMessages@2147245119(4616888/5242880 serialized/live bytes, 23714 ops) ERROR [FlushWriter:9] 2013-07-01 09:15:44,145 CassandraDaemon.java (line 192) Exception in thread Thread[FlushWriter:9,5,main] java.lang.AssertionError at org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:40) at org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:30) at org.apache.cassandra.db.OnDiskAtom$Serializer.serializeForSSTable(OnDiskAtom.java:62) at org.apache.cassandra.db.ColumnIndex$Builder.add(ColumnIndex.java:181) at org.apache.cassandra.db.ColumnIndex$Builder.build(ColumnIndex.java:133) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:185) at org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:489) at org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:448) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) I looked into the code and it seems to be coming from the following code: public void serialize(IColumn column, DataOutput dos) throws IOException { assert column.name().remaining() 0; // crash ByteBufferUtil.writeWithShortLength(column.name(), dos); try {... Does anybody have an idea why this is happening? The machine has some issues with its disks, but flush shouldn't be affected by bad disks, right? I can rule out that this memtable was filled by a bad commitlog. Thanks, Christian
10,000s of column families/keyspaces
Hi all, I know it's an old topic, but I want to see if anything's changed on the number of column families that C* supports, either in 1.2.x or 2.x. For a number of reasons [1], we'd like to support multi-tenancy via separate column families. The problem is that there are around 5,000 tenants to support and each one needs a small handful of column families each. The last I heard C* supports 'a couple of hundred' column families before things start to bog down. What will it take for C* to support 50,000 column families? I'm about to dive into the code and run some tests, but I was curious about how to quantify the overhead of a column family. Is the reason performance? Memory? Does the off-heap work help here? Thanks, Kirk [1] The main three reasons: 1. ability to wholesale drop data for a given tenant via drop keyspace/drop CFs 2. ability to have divergent schema for each tenant (partially effected by DSE Solr integration) 3. secondary indexes per tenant (given requirement #2)
Re: 10,000s of column families/keyspaces
We use playorm to do 80,000 virtual column families(a playorm feature though the pattern could be copied). We did find out later and we are working on this now that we wanted to map 80,000 virtual CF's into 10 real CF's so leveled compaction can run more in parallel though or else we get stuck with single threaded LCS at the last tier which can take a while. We are about to map/reduce our dataset into our newest format. Dean From: Kirk True kirktrue...@gmail.commailto:kirktrue...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Monday, July 1, 2013 10:19 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: 10,000s of column families/keyspaces Hi all, I know it's an old topic, but I want to see if anything's changed on the number of column families that C* supports, either in 1.2.x or 2.x. For a number of reasons [1], we'd like to support multi-tenancy via separate column families. The problem is that there are around 5,000 tenants to support and each one needs a small handful of column families each. The last I heard C* supports 'a couple of hundred' column families before things start to bog down. What will it take for C* to support 50,000 column families? I'm about to dive into the code and run some tests, but I was curious about how to quantify the overhead of a column family. Is the reason performance? Memory? Does the off-heap work help here? Thanks, Kirk [1] The main three reasons: 1. ability to wholesale drop data for a given tenant via drop keyspace/drop CFs 2. ability to have divergent schema for each tenant (partially effected by DSE Solr integration) 3. secondary indexes per tenant (given requirement #2)
Re: 10,000s of column families/keyspaces
Oh and if you are using STCS, I don't think the below is an issue at all since that can run in parallel if needed already. Dean On 7/1/13 10:24 AM, Hiller, Dean dean.hil...@nrel.gov wrote: We use playorm to do 80,000 virtual column families(a playorm feature though the pattern could be copied). We did find out later and we are working on this now that we wanted to map 80,000 virtual CF's into 10 real CF's so leveled compaction can run more in parallel though or else we get stuck with single threaded LCS at the last tier which can take a while. We are about to map/reduce our dataset into our newest format. Dean From: Kirk True kirktrue...@gmail.commailto:kirktrue...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Monday, July 1, 2013 10:19 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: 10,000s of column families/keyspaces Hi all, I know it's an old topic, but I want to see if anything's changed on the number of column families that C* supports, either in 1.2.x or 2.x. For a number of reasons [1], we'd like to support multi-tenancy via separate column families. The problem is that there are around 5,000 tenants to support and each one needs a small handful of column families each. The last I heard C* supports 'a couple of hundred' column families before things start to bog down. What will it take for C* to support 50,000 column families? I'm about to dive into the code and run some tests, but I was curious about how to quantify the overhead of a column family. Is the reason performance? Memory? Does the off-heap work help here? Thanks, Kirk [1] The main three reasons: 1. ability to wholesale drop data for a given tenant via drop keyspace/drop CFs 2. ability to have divergent schema for each tenant (partially effected by DSE Solr integration) 3. secondary indexes per tenant (given requirement #2)
Re: CorruptBlockException
On Sat, Jun 29, 2013 at 8:39 PM, Glenn Thompson gatman1...@gmail.com wrote: I'm Glenn Thompson and new to Cassandra. I have been trying to figure out how to recover from a CorruptBlockException. ... One of my nodes must have a hardware problem. Although I've been unable to find anything wrong via logs, smart, or mce. ... The repair, scrub, and decommission all produced Exceptions related to the same few corrupt files. Hardware problem sounds relatively likely, especially if you have not crashed your nodes. Only other thing I can think of is an issue with the relationship of the compression library and the JVM. What JVM/JDK are you using, and what compression method is in use on the Column Family? In general the actions you took were reasonable. Do you have the full stack trace? =Rob
Re: CorruptBlockException
Hi Rob, It was hardware. Memory. I've been loading data since I originally posted. No exceptions so far. I had some issues with OOMs when I first started playing with cassandra. I increased the amount RAM to the VM and reduced the memtable size. I'm guessing it's because I'm using I3s. More cores would most likely improve GC performance. I put all the logs and my configs on my google drive. The link is in the original post. I'm running 1.2.4. There have been two releases since my original download. I'm going to attempt an upgrade soon. I'm also considering using leveled compaction. I just have two 750GB drives per node. I'd like to use more than 50% of the drives if I can. Thanks, Glenn On Mon, Jul 1, 2013 at 11:08 AM, Robert Coli rc...@eventbrite.com wrote: On Sat, Jun 29, 2013 at 8:39 PM, Glenn Thompson gatman1...@gmail.com wrote: I'm Glenn Thompson and new to Cassandra. I have been trying to figure out how to recover from a CorruptBlockException. ... One of my nodes must have a hardware problem. Although I've been unable to find anything wrong via logs, smart, or mce. ... The repair, scrub, and decommission all produced Exceptions related to the same few corrupt files. Hardware problem sounds relatively likely, especially if you have not crashed your nodes. Only other thing I can think of is an issue with the relationship of the compression library and the JVM. What JVM/JDK are you using, and what compression method is in use on the Column Family? In general the actions you took were reasonable. Do you have the full stack trace? =Rob
Re: How to do a CAS UPDATE on single column CF?
Thanks! On 7/1/13 1:41 AM, Sylvain Lebresne wrote: You're right, there is currently no way to do this since 1) insert can't have a IF currently and 2) update can't update such table. We'll fix that: https://issues.apache.org/jira/browse/CASSANDRA-5715 -- Sylvain On Sat, Jun 29, 2013 at 9:51 PM, Blair Zajac bl...@orcaware.com mailto:bl...@orcaware.com wrote: On 6/24/13 8:23 PM, Blair Zajac wrote: How does one do an atomic update in a column family with a single column? I have a this CF CREATE TABLE schema_migrations ( version TEXT PRIMARY KEY, ) WITH COMPACTION = {'class': 'LeveledCompactionStrategy'}; Anyone? Should I raise this on the developer mailing list or open a ticket? Blair
Re: How to do a CAS UPDATE on single column CF?
What does CAS stand for? And is that the row locking feature like hbase's setAndReadWinner that you give the previous val and next val and your next val is returned if you won otherwise the current result is returned and you know some other node won? Thanks, Dean On 7/1/13 12:09 PM, Blair Zajac bl...@orcaware.com wrote: Thanks! On 7/1/13 1:41 AM, Sylvain Lebresne wrote: You're right, there is currently no way to do this since 1) insert can't have a IF currently and 2) update can't update such table. We'll fix that: https://issues.apache.org/jira/browse/CASSANDRA-5715 -- Sylvain On Sat, Jun 29, 2013 at 9:51 PM, Blair Zajac bl...@orcaware.com mailto:bl...@orcaware.com wrote: On 6/24/13 8:23 PM, Blair Zajac wrote: How does one do an atomic update in a column family with a single column? I have a this CF CREATE TABLE schema_migrations ( version TEXT PRIMARY KEY, ) WITH COMPACTION = {'class': 'LeveledCompactionStrategy'}; Anyone? Should I raise this on the developer mailing list or open a ticket? Blair
Re: Patterns for enabling Compute apps which only request Local Node's
On Sun, Jun 30, 2013 at 1:48 AM, rekt...@voodoowarez.com wrote: Question; if we're co-locating our Cassandra and our compute application on the same nodes, are there any in-use patterns in Cassandra user (or Cassandra dev) applications for having the compute application only pull data off the localhost Cassandra process? If we have the ability to manage where we do compute, what options are there for keeping compute happening on local data as much as possible? The Hadoop support provides Hadoop-like support for locality. One presumes you could make use of this functionality even if you were not actually running Hadoop map/reduce as the compute application. http://wiki.apache.org/cassandra/HadoopSupport#ClusterConfig =Rob
Re: How to do a CAS UPDATE on single column CF?
http://en.wikipedia.org/wiki/Compare-and-swap I believe C* uses Paxos for CAS but not completely sure? -- Francisco Andrades Grassi www.bigjocker.com @bigjocker On Jul 1, 2013, at 1:49 PM, Hiller, Dean dean.hil...@nrel.gov wrote: What does CAS stand for? And is that the row locking feature like hbase's setAndReadWinner that you give the previous val and next val and your next val is returned if you won otherwise the current result is returned and you know some other node won? Thanks, Dean On 7/1/13 12:09 PM, Blair Zajac bl...@orcaware.com wrote: Thanks! On 7/1/13 1:41 AM, Sylvain Lebresne wrote: You're right, there is currently no way to do this since 1) insert can't have a IF currently and 2) update can't update such table. We'll fix that: https://issues.apache.org/jira/browse/CASSANDRA-5715 -- Sylvain On Sat, Jun 29, 2013 at 9:51 PM, Blair Zajac bl...@orcaware.com mailto:bl...@orcaware.com wrote: On 6/24/13 8:23 PM, Blair Zajac wrote: How does one do an atomic update in a column family with a single column? I have a this CF CREATE TABLE schema_migrations ( version TEXT PRIMARY KEY, ) WITH COMPACTION = {'class': 'LeveledCompactionStrategy'}; Anyone? Should I raise this on the developer mailing list or open a ticket? Blair
Re: How to do a CAS UPDATE on single column CF?
According to Jonathan Ellis talk at Cassandra 13 it does use Paxos: http://www.youtube.com/watch?v=PcUpPR4nSr4list=PLqcm6qE9lgKJzVvwHprow9h7KMpb5hcUU http://www.slideshare.net/jbellis/cassandra-summit-2013-keynote Andy On 1 Jul 2013, at 19:40, Francisco Andrades Grassi bigjoc...@gmail.commailto:bigjoc...@gmail.com wrote: http://en.wikipedia.org/wiki/Compare-and-swap I believe C* uses Paxos for CAS but not completely sure? -- Francisco Andrades Grassi www.bigjocker.comhttp://www.bigjocker.com/ @bigjocker On Jul 1, 2013, at 1:49 PM, Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote: What does CAS stand for? And is that the row locking feature like hbase's setAndReadWinner that you give the previous val and next val and your next val is returned if you won otherwise the current result is returned and you know some other node won? The University of Dundee is a registered Scottish Charity, No: SC015096
Re: 10,000s of column families/keyspaces
On Mon, Jul 1, 2013 at 9:19 AM, Kirk True kirktrue...@gmail.com wrote: What will it take for C* to support 50,000 column families? As I understand it, a (the?) big problem with huge numbers of Column Families is that each ColumnFamily has a large number of MBeans associated with it, each of which consume heap. So.. a lot fewer MBeans per column family and/or MBean stuff not consuming heap? Then you still have the problem of each CF having at least one live Memtable, which even if empty will still consume heap... I'm thinking the real answer to what it will take for C* to support 50k CFs is a JVM which can functionally support heap sizes over 8gb ... which seems unlikely to happen any time soon. =Rob
Re: 10,000s of column families/keyspaces
There is another problem. You now need to run repair for a large number of column families and keyspaces and manage that, look out for schema mismatches etc. On Mon, Jul 1, 2013 at 4:09 PM, Robert Coli rc...@eventbrite.com wrote: On Mon, Jul 1, 2013 at 9:19 AM, Kirk True kirktrue...@gmail.com wrote: What will it take for C* to support 50,000 column families? As I understand it, a (the?) big problem with huge numbers of Column Families is that each ColumnFamily has a large number of MBeans associated with it, each of which consume heap. So.. a lot fewer MBeans per column family and/or MBean stuff not consuming heap? Then you still have the problem of each CF having at least one live Memtable, which even if empty will still consume heap... I'm thinking the real answer to what it will take for C* to support 50k CFs is a JVM which can functionally support heap sizes over 8gb ... which seems unlikely to happen any time soon. =Rob
Re: Cassandra as storage for cache data
The most effective way to deal with obsolete Tombstones in the short lived cache case seems to be to drop them on the floor en masse... :D a) have two column families that the application alternates between, modulo time_period b) truncate and populate the cold one c) read from the hot one d) clear snapshots frequently This avoids the downsides of dealing with Tombstones entirely, with only the cost of increased complexity to manage snapshots. One could (NOT RECOMMENDED) also disable automatic snapshotting on truncate... =Rob PS - apparently in the past this would have resulted in schema CF growing without bound, but that is no longer the case...
Dynamic Snitch and EC2MultiRegionSnitch
How does dynamic snitch work with EC2MultiRegionSnitch? Can dynamic routing only happen in one data center? We don't wan to have the requests routed to another center even nodes are idle in other side since the network could be slow. Thanks in advance, Daning
RE: about FlushWriter All time blocked
Thanks guys, these sound like good suggestions, will try those out. Aaron, we have around 80 CFs. From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Friday, June 28, 2013 10:05 PM To: user@cassandra.apache.org Subject: Re: about FlushWriter All time blocked We do not use secondary indexes or snapshots Out of interest how many CF's do you have ? Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 28/06/2013, at 7:52 AM, Nate McCall zznat...@gmail.commailto:zznat...@gmail.com wrote: Non-zero for pending tasks is too transient. Try monitoring tpstats with a (much) higher frequency and look for sustained threshold over a duration. Then, using a percentage of the configuration values for the max - 75% of memtable_flush_queue_size in this case - alert when it has been higher than '3' for more than N time. (Start with N=60 seconds and go from there). Also, that is a very high 'all time blocked' to 'completed' ratio for FlushWriter. If iostat is happy, i'd do as Aaron suggested above and turn up the memtable_flush_queue_size and play around with turning up memtable_flush_writers (incrementally and separately for both of course so you can see the effect). On Thu, Jun 27, 2013 at 2:27 AM, Arindam Barua aba...@247-inc.commailto:aba...@247-inc.com wrote: In our performance tests, we are seeing similar FlushWriter, MutationStage, MemtablePostFlusher pending tasks become non-zero. We collect snapshots every 5 minutes, and they seem to clear after ~10-15 minutes though. (The flush writer has an 'All time blocked' count of 540 in the below example). We do not use secondary indexes or snapshots. We do not use SSDs. We have a 4-node cluster with around 30-40 GB data on each node. Each node has 3 1-TB disks with a RAID 0 setup. Currently we monitor the tpstats every 5 minutes, and alert if FlushWriter or MutationStage has a non-zero Pending count. Any suggestions if this is a cause of concern already, or, should we alert only if that count becomes greater than a bigger number, say 10, or if the count remains non-zero greater than a specified time. Pool NameActive Pending Completed Blocked All time blocked ReadStage 0 0 15685133 0 0 RequestResponseStage 0 0 29880863 0 0 MutationStage 0 0 40457340 0 0 ReadRepairStage 0 0 704322 0 0 ReplicateOnWriteStage 0 0 0 0 0 GossipStage 0 02283062 0 0 AntiEntropyStage 0 0 0 0 0 MigrationStage0 0 70 0 0 MemtablePostFlusher 1 1 1837 0 0 StreamStage 0 0 0 0 0 FlushWriter 1 1 1446 0 540 MiscStage 0 0 0 0 0 commitlog_archiver0 0 0 0 0 InternalResponseStage 0 0 43 0 0 HintedHandoff 0 0 3 0 0 Thanks, Arindam -Original Message- From: aaron morton [mailto:aa...@thelastpickle.comhttp://thelastpickle.com] Sent: Tuesday, June 25, 2013 10:29 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: about FlushWriter All time blocked FlushWriter 0 0191 0 12 This means there were 12 times the code wanted to put an memtable in the queue to be flushed to disk but the queue was full. The length of this queue is controlled by the memtable_flush_queue_size https://github.com/apache/cassandra/blob/cassandra-1.2/conf/cassandra.yaml#L299 and memtable_flush_writers . When this happens an internal lock around the commit log is held which prevents writes from being processed. In general it means the IO system cannot keep up. It can sometimes happen when snapshot is used as all the CF's are flushed to disk at once. I also suspect it happens sometimes when a commit log segment is flushed and their are a lot of dirty CF's. But i've never proved it. Increase memtable_flush_queue_size following the help in the yaml file. If you do not use secondary indexes are you using snapshot? Hope that helps. A - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 24/06/2013, at 3:41 PM, yue.zhang
schema management
Hi, I've been giving some thought to the way we deploy schemas and am looking for something better than out current approach, which is to use cassandra-cli scripts. What do people use for this ? cheers -- *Franc Carter* | Systems architect | Sirca Ltd marc.zianideferra...@sirca.org.au franc.car...@sirca.org.au | www.sirca.org.au Tel: +61 2 8355 2514 Level 4, 55 Harrington St, The Rocks NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215
query deadlock(?) after flushing a table
Hey, I created a table with a wide row. Query on the wide row after removing the entries and flushing the table becomes very slow. I am aware of the impact of tombstones but it seems that there is a deadlock which prevents the query to be completed. step by step: 1. creating the keyspace and the table: CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}; use test; CREATE TABLE job_index ( stage text, timestamp text, PRIMARY KEY (stage, timestamp) ) WITH gc_grace_seconds=10 AND compaction={'sstable_size_in_mb': '10', 'class': 'LeveledCompactionStrategy'}; 2. insert 5000 entries to the job_index column family using the attached script (insert_1-5000.cql) 3. flushing the table: nodetool flush test job_index 4. delete the 5000 entries in the wide row using the attached script (delete_1-5000.cql) so far the queries return all the entries in the wide row in a fraction of a second. 5. flushing the table: nodetool flush test job_index 6. run the following query: cqlsh:test SELECT * from job_index limit 1 ; Request did not complete within rpc_timeout. The execution of the query gets blocked and eventually the query times out. In the cassandra's log file I see the following lines: DEBUG [ScheduledTasks:1] 2013-07-01 19:10:39,469 GCInspector.java (line 121) GC for ParNew: 16 ms for 5 collections, 754590496 used; max is 2093809664 DEBUG [ScheduledTasks:1] 2013-07-01 19:10:40,473 GCInspector.java (line 121) GC for ParNew: 19 ms for 6 collections, 547894840 used; max is 2093809664 DEBUG [ScheduledTasks:1] 2013-07-01 19:10:41,475 GCInspector.java (line 121) GC for ParNew: 16 ms for 5 collections, 771812864 used; max is 2093809664 A few minutes later after the compaction finishes the problem goes away. I am using cassandra 1.2.6. I tested on Linux (CentOS) and MacOS and I get the same result! Is this a known issue?
Re: schema management
You can generate schema through the code. That is also one option. On Mon, Jul 1, 2013 at 4:10 PM, Franc Carter franc.car...@sirca.org.auwrote: Hi, I've been giving some thought to the way we deploy schemas and am looking for something better than out current approach, which is to use cassandra-cli scripts. What do people use for this ? cheers -- *Franc Carter* | Systems architect | Sirca Ltd marc.zianideferra...@sirca.org.au franc.car...@sirca.org.au | www.sirca.org.au Tel: +61 2 8355 2514 Level 4, 55 Harrington St, The Rocks NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215
Re: schema management
Franc-- I think you will find Mutagen Cassandra very interesting; it is similar to schema management tools like Flyway for SQL databases: Mutagen Cassandra is a framework (based on Mutagen) that provides schema versioning and mutation for Apache Cassandra. Mutagen is a lightweight framework for applying versioned changes (known as mutations) to a resource, in this case a Cassandra schema. Mutagen takes into account the resource's existing state and only applies changes that haven't yet been applied. Schema mutation with Mutagen helps you make manageable changes to the schema of live Cassandra instances as you update your software, and is especially useful when used across development, test, staging, and production environments to automatically keep schemas in sync. https://github.com/toddfast/mutagen-cassandra Todd On Mon, Jul 1, 2013 at 5:23 PM, sankalp kohli kohlisank...@gmail.comwrote: You can generate schema through the code. That is also one option. On Mon, Jul 1, 2013 at 4:10 PM, Franc Carter franc.car...@sirca.org.auwrote: Hi, I've been giving some thought to the way we deploy schemas and am looking for something better than out current approach, which is to use cassandra-cli scripts. What do people use for this ? cheers -- *Franc Carter* | Systems architect | Sirca Ltd marc.zianideferra...@sirca.org.au franc.car...@sirca.org.au | www.sirca.org.au Tel: +61 2 8355 2514 Level 4, 55 Harrington St, The Rocks NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215
Re: schema management
On Tue, Jul 2, 2013 at 10:33 AM, Todd Fast t...@digitalexistence.comwrote: Franc-- I think you will find Mutagen Cassandra very interesting; it is similar to schema management tools like Flyway for SQL databases: Oops - forgot to mention in my original email that we will be looking into Mutagen Cassandra in the medium term. I'm after something with a low barrier to entry initially as we are quite time constrained. cheers Mutagen Cassandra is a framework (based on Mutagen) that provides schema versioning and mutation for Apache Cassandra. Mutagen is a lightweight framework for applying versioned changes (known as mutations) to a resource, in this case a Cassandra schema. Mutagen takes into account the resource's existing state and only applies changes that haven't yet been applied. Schema mutation with Mutagen helps you make manageable changes to the schema of live Cassandra instances as you update your software, and is especially useful when used across development, test, staging, and production environments to automatically keep schemas in sync. https://github.com/toddfast/mutagen-cassandra Todd On Mon, Jul 1, 2013 at 5:23 PM, sankalp kohli kohlisank...@gmail.comwrote: You can generate schema through the code. That is also one option. On Mon, Jul 1, 2013 at 4:10 PM, Franc Carter franc.car...@sirca.org.auwrote: Hi, I've been giving some thought to the way we deploy schemas and am looking for something better than out current approach, which is to use cassandra-cli scripts. What do people use for this ? cheers -- *Franc Carter* | Systems architect | Sirca Ltd marc.zianideferra...@sirca.org.au franc.car...@sirca.org.au | www.sirca.org.au Tel: +61 2 8355 2514 Level 4, 55 Harrington St, The Rocks NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215 -- *Franc Carter* | Systems architect | Sirca Ltd marc.zianideferra...@sirca.org.au franc.car...@sirca.org.au | www.sirca.org.au Tel: +61 2 8355 2514 Level 4, 55 Harrington St, The Rocks NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215
Streaming performance with 1.2.6
Hi, We've recently been testing some of the higher performance instance classes on EC2, specifically the hi1.4xlarge, with Cassandra. For those that are not familiar with them, they have two SSD disks and 10 gige. While we have observed much improved raw performance over our current instances, we are seeing a fairly large gap between Cassandra and raw performance. We have particularly noticed a gap in the streaming performance when bootstrapping a new node. I wanted to ensure that we have configured these instances correctly to get the best performance out of Cassandra. When bootstrapping a new node into a small ring with a 35GB streaming payload, we see a 5-8 MB/sec max streaming rate joining the new node to the ring. We are using 1.2.6 with 256 token vnode support. In our tests the ring is small enough so all streaming occurs from a single node. To test hardware performance for this use case, we ran an rsync of the sstables from one node to the next (to/from the same file systems) and observed a consistent rate of 115 MB/sec. The only changes we've made to the config (aside from dirs/hosts) are: -concurrent_reads: 32 -concurrent_writes: 32 +concurrent_reads: 128 # 32 +concurrent_writes: 128 # 32 -rpc_server_type: sync +rpc_server_type: hsha # sync -compaction_throughput_mb_per_sec: 16 +compaction_throughput_mb_per_sec: 256 # 16 -read_request_timeout_in_ms: 1 +read_request_timeout_in_ms: 6000 # 1 -endpoint_snitch: SimpleSnitch +endpoint_snitch: Ec2Snitch # SimpleSnitch -internode_compression: all +internode_compression: none We use a 10G heap with a 2G new size. We are using the Oracle 1.7.0_25 JVM. I've adjusted our streaming throughput limit from 200MB/sec up to 800MB/sec on both the sending and receiving streaming nodes, but that doesn't appear to make a difference. The disks are raid0 (2 * 1T SSD) with 512 read ahead, XFS. The nodes in the ring are running about 23% CPU on average, with spikes up to a maximum of 45% CPU. As I mentioned, on the same boxes with the same workloads, I've seen up to 115 MB/sec transfers with rsync. Any suggestions for what to adjust to see better streaming performance? 5% of what a single rsync can do seems somewhat limited. Thanks, Mike -- Mike Heffner m...@librato.com Librato, Inc.
very inefficient operation with tombstones
Querying a table with 5000 thousands tombstones take 3 minutes to complete! But Querying the same table with the same data pattern with 10,000 entries takes a fraction of second to complete! Details: 1. created the following table: CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}; use test; CREATE TABLE job_index ( stage text, timestamp text, PRIMARY KEY (stage, timestamp)); 2. inserted 5000 entries to the table: INSERT INTO job_index (stage, timestamp) VALUES ( 'a', '0001' ); INSERT INTO job_index (stage, timestamp) VALUES ( 'a', '0002' ); INSERT INTO job_index (stage, timestamp) VALUES ( 'a', '4999' ); INSERT INTO job_index (stage, timestamp) VALUES ( 'a', '5000' ); 3. flushed the table: nodetool flush test job_index 4. deleted the 5000 entries: DELETE from job_index WHERE stage ='a' AND timestamp = '0001' ; DELETE from job_index WHERE stage ='a' AND timestamp = '0002' ; ... DELETE from job_index WHERE stage ='a' AND timestamp = '4999' ; DELETE from job_index WHERE stage ='a' AND timestamp = '5000' ; 5. flushed the table: nodetool flush test job_index 6. querying the table takes 3 minutes to complete: cqlsh:test SELECT * from job_index limit 2; tracing: http://pastebin.com/jH2rZN2X while query was getting executed I saw a lot of GC entries in cassandra's log: DEBUG [ScheduledTasks:1] 2013-07-01 23:47:59,221 GCInspector.java (line 121) GC for ParNew: 30 ms for 6 collections, 263993608 used; max is 2093809664 DEBUG [ScheduledTasks:1] 2013-07-01 23:48:00,222 GCInspector.java (line 121) GC for ParNew: 29 ms for 6 collections, 186209616 used; max is 2093809664 DEBUG [ScheduledTasks:1] 2013-07-01 23:48:01,223 GCInspector.java (line 121) GC for ParNew: 29 ms for 6 collections, 108731464 used; max is 2093809664 It seems that something very inefficient is happening in managing tombstones. If I start with a clean table and do the following: 1. insert 5000 entries 2. flush to disk 3. insert new 5000 entries 4. flush to disk Querying the job_index for all the 10,000 entries takes a fraction of second to complete: tracing: http://pastebin.com/scUN9JrP The fact that iterating over 5000 tombstones takes 3 minutes but iterating over 10,000 live cells takes fraction of a second to suggest that something very inefficient is happening in managing tombstones. I appreciate if any developer can look into this. -M