In place vnode conversion possible?
Hi, I know that adding a new vnode enabled DC is the recommended method to convert and existing cluster to vnode. And that the cassandra-shuffle utility has been removed. That said, I've done some testing and it appears to be possible to perform an in place conversion as long as all nodes contain all data (3 nodes and replication factor 3 for example) like this: for each node: - nodetool -h localhost disablegossip (Not sure if this is needed) - cqlsh localhost UPDATE system.local SET tokens=$NEWTOKENS WHERE key='local'; - nodetool -h localhost disablethrift (Not sure if this is needed) - nodetool -h localhost drain - service cassandra restart And the following python snippet was used to generate $NEWTOKENS for each node (RandomPartitioner): import random print str([str(x) for x in sorted(random.randint(0,2**127-1) for x in range(256))]).replace('[', '{').replace(']', '}') I've tested this in a test cluster and it seems to work just fine. Has anyone else done anything similar? Or if manually changing tokens is impossible and something horrible will hit me down the line? Test cluster configuration -- Cassandra version: 1.2.19 Number of nodes: 3 Keyspace: NetworkTopologyStrategy: {DC1: 1, DC2:1, DC3: 1} / Jonas signature.asc Description: OpenPGP digital signature
Understanding what is key and partition key
Hello all, I have read a lot about Cassandra and I read about key-value pairs, partition keys, clustering keys, etc.. Is key mentioned in key-value pair and partition key refers to same or are they different? CREATE TABLE corpus.bigram_time_category_ordered_frequency ( id bigint, word1 varchar, word2 varchar, year int, category varchar, frequency int, PRIMARY KEY((year, category),frequency,word1,word2)); In this schema, I know (year, category) is the compound partition key and frequency is the clustering key. What is the key here? Thank You! -- *Chamila Dilshan Wijayarathna,* SMIEEE, SMIESL, Undergraduate, Department of Computer Science and Engineering, University of Moratuwa.
Re: Understanding what is key and partition key
Correction: year and category form a “composite partition key”. frequency, word1, and word2 are “clustering columns”. The combination of a partition key with clustering columns is a “compound primary key”. Every CQL row will have a partition key by definition, and may optionally have clustering columns. “The key” should just be a synonym for “primary key”, although sometimes people are loosely speaking about “the partition” (which should be “the partition key”) rather than the CQL “row”. -- Jack Krupansky From: Chamila Wijayarathna Sent: Tuesday, December 16, 2014 8:03 AM To: user@cassandra.apache.org Subject: Understanding what is key and partition key Hello all, I have read a lot about Cassandra and I read about key-value pairs, partition keys, clustering keys, etc.. Is key mentioned in key-value pair and partition key refers to same or are they different? CREATE TABLE corpus.bigram_time_category_ordered_frequency ( id bigint, word1 varchar, word2 varchar, year int, category varchar, frequency int, PRIMARY KEY((year, category),frequency,word1,word2) ); In this schema, I know (year, category) is the compound partition key and frequency is the clustering key. What is the key here? Thank You! -- Chamila Dilshan Wijayarathna, SMIEEE, SMIESL, Undergraduate, Department of Computer Science and Engineering, University of Moratuwa.
Re: Understanding what is key and partition key
Hi Jack, So what will be the keys and values of the following CF instance? year | category | frequency | word1| word2 | id --+--+---+--+-+--- 2014 |N | 1 |සියළුම | යුද්ධ | 664 2014 |N | 1 |එච් | කාණ්ඩය | 12526 2014 |N | 1 |ගජබා | සුපර්ක්රොස් | 25779 2014 |N | 1 | බී| කාණ්ඩය | 12505 Thank You! On Tue, Dec 16, 2014 at 6:45 PM, Jack Krupansky j...@basetechnology.com wrote: Correction: year and category form a “composite partition key”. frequency, word1, and word2 are “clustering columns”. The combination of a partition key with clustering columns is a “compound primary key”. Every CQL row will have a partition key by definition, and may optionally have clustering columns. “The key” should just be a synonym for “primary key”, although sometimes people are loosely speaking about “the partition” (which should be “the partition key”) rather than the CQL “row”. -- Jack Krupansky *From:* Chamila Wijayarathna cdwijayarat...@gmail.com *Sent:* Tuesday, December 16, 2014 8:03 AM *To:* user@cassandra.apache.org *Subject:* Understanding what is key and partition key Hello all, I have read a lot about Cassandra and I read about key-value pairs, partition keys, clustering keys, etc.. Is key mentioned in key-value pair and partition key refers to same or are they different? CREATE TABLE corpus.bigram_time_category_ordered_frequency ( id bigint, word1 varchar, word2 varchar, year int, category varchar, frequency int, PRIMARY KEY((year, category),frequency,word1,word2)); In this schema, I know (year, category) is the compound partition key and frequency is the clustering key. What is the key here? Thank You! -- *Chamila Dilshan Wijayarathna,* SMIEEE, SMIESL, Undergraduate, Department of Computer Science and Engineering, University of Moratuwa. -- *Chamila Dilshan Wijayarathna,* SMIEEE, SMIESL, Undergraduate, Department of Computer Science and Engineering, University of Moratuwa.
Re: Understanding what is key and partition key
For the first row, the key is: (2014, N, 1, සියළුම, යුද්ධ) and the value-part is (664). Cheers, Jens ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter On Tue, Dec 16, 2014 at 2:25 PM, Chamila Wijayarathna cdwijayarat...@gmail.com wrote: Hi Jack, So what will be the keys and values of the following CF instance? year | category | frequency | word1| word2 | id --+--+---+--+-+--- 2014 |N | 1 |සියළුම | යුද්ධ | 664 2014 |N | 1 |එච් | කාණ්ඩය | 12526 2014 |N | 1 |ගජබා | සුපර්ක්රොස් | 25779 2014 |N | 1 | බී| කාණ්ඩය | 12505 Thank You! On Tue, Dec 16, 2014 at 6:45 PM, Jack Krupansky j...@basetechnology.com wrote: Correction: year and category form a “composite partition key”. frequency, word1, and word2 are “clustering columns”. The combination of a partition key with clustering columns is a “compound primary key”. Every CQL row will have a partition key by definition, and may optionally have clustering columns. “The key” should just be a synonym for “primary key”, although sometimes people are loosely speaking about “the partition” (which should be “the partition key”) rather than the CQL “row”. -- Jack Krupansky *From:* Chamila Wijayarathna cdwijayarat...@gmail.com *Sent:* Tuesday, December 16, 2014 8:03 AM *To:* user@cassandra.apache.org *Subject:* Understanding what is key and partition key Hello all, I have read a lot about Cassandra and I read about key-value pairs, partition keys, clustering keys, etc.. Is key mentioned in key-value pair and partition key refers to same or are they different? CREATE TABLE corpus.bigram_time_category_ordered_frequency ( id bigint, word1 varchar, word2 varchar, year int, category varchar, frequency int, PRIMARY KEY((year, category),frequency,word1,word2)); In this schema, I know (year, category) is the compound partition key and frequency is the clustering key. What is the key here? Thank You! -- *Chamila Dilshan Wijayarathna,* SMIEEE, SMIESL, Undergraduate, Department of Computer Science and Engineering, University of Moratuwa. -- *Chamila Dilshan Wijayarathna,* SMIEEE, SMIESL, Undergraduate, Department of Computer Science and Engineering, University of Moratuwa.
Re: Understanding what is key and partition key
Hi Jens, Thank You! On Tue, Dec 16, 2014 at 7:03 PM, Jens Rantil jens.ran...@tink.se wrote: For the first row, the key is: (2014, N, 1, සියළුම, යුද්ධ) and the value-part is (664). Cheers, Jens ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter On Tue, Dec 16, 2014 at 2:25 PM, Chamila Wijayarathna cdwijayarat...@gmail.com wrote: Hi Jack, So what will be the keys and values of the following CF instance? year | category | frequency | word1| word2 | id --+--+---+--+-+--- 2014 |N | 1 |සියළුම | යුද්ධ | 664 2014 |N | 1 |එච් | කාණ්ඩය | 12526 2014 |N | 1 |ගජබා | සුපර්ක්රොස් | 25779 2014 |N | 1 | බී| කාණ්ඩය | 12505 Thank You! On Tue, Dec 16, 2014 at 6:45 PM, Jack Krupansky j...@basetechnology.com wrote: Correction: year and category form a “composite partition key”. frequency, word1, and word2 are “clustering columns”. The combination of a partition key with clustering columns is a “compound primary key”. Every CQL row will have a partition key by definition, and may optionally have clustering columns. “The key” should just be a synonym for “primary key”, although sometimes people are loosely speaking about “the partition” (which should be “the partition key”) rather than the CQL “row”. -- Jack Krupansky *From:* Chamila Wijayarathna cdwijayarat...@gmail.com *Sent:* Tuesday, December 16, 2014 8:03 AM *To:* user@cassandra.apache.org *Subject:* Understanding what is key and partition key Hello all, I have read a lot about Cassandra and I read about key-value pairs, partition keys, clustering keys, etc.. Is key mentioned in key-value pair and partition key refers to same or are they different? CREATE TABLE corpus.bigram_time_category_ordered_frequency ( id bigint, word1 varchar, word2 varchar, year int, category varchar, frequency int, PRIMARY KEY((year, category),frequency,word1,word2)); In this schema, I know (year, category) is the compound partition key and frequency is the clustering key. What is the key here? Thank You! -- *Chamila Dilshan Wijayarathna,* SMIEEE, SMIESL, Undergraduate, Department of Computer Science and Engineering, University of Moratuwa. -- *Chamila Dilshan Wijayarathna,* SMIEEE, SMIESL, Undergraduate, Department of Computer Science and Engineering, University of Moratuwa. -- *Chamila Dilshan Wijayarathna,* SMIEEE, SMIESL, Undergraduate, Department of Computer Science and Engineering, University of Moratuwa.
Defining DataSet.json for cassandra-unit testing
Hello all, I am trying to test my application using cassandra-unit with following schema and data given below. CREATE TABLE corpus.bigram_time_category_ordered_frequency ( id bigint, word1 varchar, word2 varchar, year int, category varchar, frequency int, PRIMARY KEY((year, category),frequency,word1,word2)); year | category | frequency | word1| word2 | id --+--+---+--+-+--- 2014 |N | 1 |සියළුම | යුද්ධ | 664 2014 |N | 1 |එච් | කාණ්ඩය | 12526 2014 |N | 1 |ගජබා | සුපර්ක්රොස් | 25779 2014 |N | 1 | බී| කාණ්ඩය | 12505 Since this has a compound primary key, I am not clear with how to define dataset.json [1] for this CF. Can somebody help me on how to do that? Thank You! 1. https://github.com/jsevellec/cassandra-unit/wiki/What-can-you-set-into-a-dataSet -- *Chamila Dilshan Wijayarathna,* SMIEEE, SMIESL, Undergraduate, Department of Computer Science and Engineering, University of Moratuwa.
Re: batch_size_warn_threshold_in_kb
You are, of course, free to use batches in your application I'm not looking to justify the use of batches, I'm looking for the path forward that will give us the Best Results™ both near and long term, for some definition of Best (which would be a balance of client throughput and cluster pressure). If individual writes are best for us, that's what I want to do. If batches are best for us, that's what I want to do. I'm just struggling that I'm not able to reproduce your advice experimentally, and it's not just a few percent difference, it's 5x to 8x difference. It's really difficult for me to adopt advice blindly when it differs from my own observations by such a substantial amount. That means something is wrong either with my observations or with the advice, and I would really like to know which. I'm not trying to be argumentative or push for a particular approach, I'm trying to resolve an inconsistency. RE your questions: I'm sorry this turns into a wall of text, simple questions about parallelism and distributed systems rarely can be adequately answered in just a few words. I'm trying to be open and transparent about my testing approach because I want to find out where the disconnect is here. At the same time I'm trying to bridge the knowledge gap since I'm working with parallelism toolset with which you're not familiar, and that could obviously have a substantial impact on the results. Hopefully someone else in the community familiar with Scala will notice this and provide feedback that I'm not making a fundamental mistake. 1) My original runs were in EC2 being driven by a different server than the Cassandra cluster, but in the same AZ as one of the Cassandra servers (typical 3-AZ setup for Cassandra). All four instances (3x C*, 1x test driver) were i2.2xl, so have gigabit network between them. 2) The system was under some moderate other load, this is our test cluster that takes a steady stream of simulated data to provide other developers with something to work against. That load is quite constant and doesn't work these servers particularly hard - only a few thousand records per second typically. Load averages between 1 and 3 most of the time. Unfortunately I'm not successful getting cassandra-stress talking to this cluster because of ssl configuration (it doesn't seem to actually pay attention to -ts and -tspw command line flags). I can find out if our ops guys would be ok with turning off ssl for a while, but as that would break our other applications using the same cluster, and may block our other engineers as a result. So it has farther reaching implications than just being something I can happily turn on or off at whim. I'm curious how you would expect the performance of my stress tool to differ when the cluster was being overworked - could you explain what you anticipate the change in results to look like? I.e. would single-writes remain about constant for performance while batches would degrade in performance? 3) Well I specifically attempt to control for this by testing three different concurrency models, these were named by me parallel, scatter, and traverse (just aliases to make it easier to control the driver). You can see the code between the different approaches here - they are pretty similar to each other, but probably involve some knowledge of how concurrency works in Scala to really appreciate the differences: https://gist.github.com/MightyE/1c98912fca104f6138fc/a7db68e72f99ac1215fcfb096d69391ee285c080#file-testsuite-L181-L203 I know you're not a Scala guy, so I'll explain roughly what they do, but the point is that I'm trying hard to control for just having chosen a bad concurrency model: scatter - Take all of the Statements and call executeAsync() on them as *fast* the Session will let me. This is the Unintelligent Brute Force approach, and it's definitely not how I would model a typical production application as it doesn't attempt to respond to system pressure at all, and it's trying to gobble up as many resources as it can. Use the Scala Futures system to combine the the set of async calls into a single Future that completes when all the futures returned from executeAsync() have completed. traverse - Give all of the Statements to the Scala Futures system and tell it to call executeAsync() on them all at the rate that it thinks is appropriate. This would be much closer to my recommendation on how to model a production application, because in a real application, there's more than a single class of work to be done, and the Futures system schedules both this work and other work intelligently and configurably. It gives us a single awaitable Future that completes when it has finished all of its work and all of the async calls have been completed. You guys are using Netty for your native protocol, and Netty offers true event driven concurrency which gets along famously well with Scala's Futures system. parallel - Use a Scala Parallel collection to
does consistency=ALL for deletes obviate the need for tombstones?
Howdy all, Our use of cassandra unfortunately makes use of lots of deletes. Yes, I know that C* is not well suited to this kind of workload, but that's where we are, and before I go looking for an entirely new data layer I would rather explore whether C* could be tuned to work well for us. However, deletions are never driven by users in our app - deletions always occur by backend processes to clean up data after it has been processed, and thus they do not need to be 100% available. So this made me think, what if I did the following? - gc_grace_seconds = 0, which ensures that tombstones are never created - replication factor = 3 - for writes that are inserts, consistency = QUORUM, which ensures that writes can proceed even if 1 replica is slow/down - for deletes, consistency = ALL, which ensures that when we delete a record it disappears entirely (no need for tombstones) - for reads, consistency = QUORUM Also, I should clarify that our data essentially append only, so I don't need to worry about inconsistencies created by partial updates (e.g. value gets changed on one machine but not another). Sometimes there will be duplicate writes, but I think that should be fine since the value is always identical. Any red flags with this approach? Has anyone tried it and have experiences to share? Also, I *think* that this means that I don't need to run repairs, which from an ops perspective is great. Thanks, as always, - Ian
Re: does consistency=ALL for deletes obviate the need for tombstones?
No, deletes are always written as a tombstone no matter the consistency. This is because data at rest is written to sstables which are immutable once written. The tombstone marks that a record in another sstable is now deleted, and so a read of that value should be treated as if it doesn't exist. When sstables are later compacted, several sstables are merged into one and any overlapping values between the tables are condensed into one. Values which have a tombstone can be excluded from the new sstable. GC grace period indicates how long a tombstone should be kept after all underlying values have been compacted away so that the deleted value can't be resurrected if a node rejoins the cluster which knew that value. On Dec 16, 2014 8:23 AM, Ian Rose ianr...@fullstory.com wrote: Howdy all, Our use of cassandra unfortunately makes use of lots of deletes. Yes, I know that C* is not well suited to this kind of workload, but that's where we are, and before I go looking for an entirely new data layer I would rather explore whether C* could be tuned to work well for us. However, deletions are never driven by users in our app - deletions always occur by backend processes to clean up data after it has been processed, and thus they do not need to be 100% available. So this made me think, what if I did the following? - gc_grace_seconds = 0, which ensures that tombstones are never created - replication factor = 3 - for writes that are inserts, consistency = QUORUM, which ensures that writes can proceed even if 1 replica is slow/down - for deletes, consistency = ALL, which ensures that when we delete a record it disappears entirely (no need for tombstones) - for reads, consistency = QUORUM Also, I should clarify that our data essentially append only, so I don't need to worry about inconsistencies created by partial updates (e.g. value gets changed on one machine but not another). Sometimes there will be duplicate writes, but I think that should be fine since the value is always identical. Any red flags with this approach? Has anyone tried it and have experiences to share? Also, I *think* that this means that I don't need to run repairs, which from an ops perspective is great. Thanks, as always, - Ian
Re: does consistency=ALL for deletes obviate the need for tombstones?
Tombstones have to be created. The SSTables are immutable, so the data cannot be deleted. Therefore, a tombstone is required. The value you deleted will be physically removed during compaction. My workload sounds similar to yours in some respects, and I was able to get C* working for me. I have large chunks of data which I periodically replace. I write the new data, update a reference, and then delete the old data. I designed my schema to be tombstone-friendly, and C* works great. For some of my tables I am able to delete entire partitions. Because of the reference that I updated, I never try to access the old data, and therefore the tombstones for these partitions are never read. The old data simply has to wait for compaction. Other tables require deleting records within partitions. These tombstones do get read, so there are performance implications. I was able to design my schema so that no partition ever has more than a few tombstones (one for each generation of deleted data, which is usually no more than one). Hope this helps. Robert On Dec 16, 2014, at 8:22 AM, Ian Rose ianr...@fullstory.commailto:ianr...@fullstory.com wrote: Howdy all, Our use of cassandra unfortunately makes use of lots of deletes. Yes, I know that C* is not well suited to this kind of workload, but that's where we are, and before I go looking for an entirely new data layer I would rather explore whether C* could be tuned to work well for us. However, deletions are never driven by users in our app - deletions always occur by backend processes to clean up data after it has been processed, and thus they do not need to be 100% available. So this made me think, what if I did the following? * gc_grace_seconds = 0, which ensures that tombstones are never created * replication factor = 3 * for writes that are inserts, consistency = QUORUM, which ensures that writes can proceed even if 1 replica is slow/down * for deletes, consistency = ALL, which ensures that when we delete a record it disappears entirely (no need for tombstones) * for reads, consistency = QUORUM Also, I should clarify that our data essentially append only, so I don't need to worry about inconsistencies created by partial updates (e.g. value gets changed on one machine but not another). Sometimes there will be duplicate writes, but I think that should be fine since the value is always identical. Any red flags with this approach? Has anyone tried it and have experiences to share? Also, I *think* that this means that I don't need to run repairs, which from an ops perspective is great. Thanks, as always, - Ian
Re: does consistency=ALL for deletes obviate the need for tombstones?
Ah, makes sense. Thanks for the explanations! - Ian On Tue, Dec 16, 2014 at 10:53 AM, Robert Wille rwi...@fold3.com wrote: Tombstones have to be created. The SSTables are immutable, so the data cannot be deleted. Therefore, a tombstone is required. The value you deleted will be physically removed during compaction. My workload sounds similar to yours in some respects, and I was able to get C* working for me. I have large chunks of data which I periodically replace. I write the new data, update a reference, and then delete the old data. I designed my schema to be tombstone-friendly, and C* works great. For some of my tables I am able to delete entire partitions. Because of the reference that I updated, I never try to access the old data, and therefore the tombstones for these partitions are never read. The old data simply has to wait for compaction. Other tables require deleting records within partitions. These tombstones do get read, so there are performance implications. I was able to design my schema so that no partition ever has more than a few tombstones (one for each generation of deleted data, which is usually no more than one). Hope this helps. Robert On Dec 16, 2014, at 8:22 AM, Ian Rose ianr...@fullstory.com wrote: Howdy all, Our use of cassandra unfortunately makes use of lots of deletes. Yes, I know that C* is not well suited to this kind of workload, but that's where we are, and before I go looking for an entirely new data layer I would rather explore whether C* could be tuned to work well for us. However, deletions are never driven by users in our app - deletions always occur by backend processes to clean up data after it has been processed, and thus they do not need to be 100% available. So this made me think, what if I did the following? - gc_grace_seconds = 0, which ensures that tombstones are never created - replication factor = 3 - for writes that are inserts, consistency = QUORUM, which ensures that writes can proceed even if 1 replica is slow/down - for deletes, consistency = ALL, which ensures that when we delete a record it disappears entirely (no need for tombstones) - for reads, consistency = QUORUM Also, I should clarify that our data essentially append only, so I don't need to worry about inconsistencies created by partial updates (e.g. value gets changed on one machine but not another). Sometimes there will be duplicate writes, but I think that should be fine since the value is always identical. Any red flags with this approach? Has anyone tried it and have experiences to share? Also, I *think* that this means that I don't need to run repairs, which from an ops perspective is great. Thanks, as always, - Ian
Re: does consistency=ALL for deletes obviate the need for tombstones?
When you say “no need for tombstones”, did you actually read that somewhere or were you just speculating? If the former, where exactly? -- Jack Krupansky From: Ian Rose Sent: Tuesday, December 16, 2014 10:22 AM To: user Subject: does consistency=ALL for deletes obviate the need for tombstones? Howdy all, Our use of cassandra unfortunately makes use of lots of deletes. Yes, I know that C* is not well suited to this kind of workload, but that's where we are, and before I go looking for an entirely new data layer I would rather explore whether C* could be tuned to work well for us. However, deletions are never driven by users in our app - deletions always occur by backend processes to clean up data after it has been processed, and thus they do not need to be 100% available. So this made me think, what if I did the following? a.. gc_grace_seconds = 0, which ensures that tombstones are never created b.. replication factor = 3 c.. for writes that are inserts, consistency = QUORUM, which ensures that writes can proceed even if 1 replica is slow/down d.. for deletes, consistency = ALL, which ensures that when we delete a record it disappears entirely (no need for tombstones) e.. for reads, consistency = QUORUM Also, I should clarify that our data essentially append only, so I don't need to worry about inconsistencies created by partial updates (e.g. value gets changed on one machine but not another). Sometimes there will be duplicate writes, but I think that should be fine since the value is always identical. Any red flags with this approach? Has anyone tried it and have experiences to share? Also, I *think* that this means that I don't need to run repairs, which from an ops perspective is great. Thanks, as always, - Ian
Re: Hinted handoff not working
Nope. I added millions of records and several GB to the cluster while one node was down, and then ran nodetool flush system hints on a couple of nodes that were up, and system/hints has less than 200K in it. Here’s the relevant part of nodetool cfstats system.hints: Keyspace: system Read Count: 28572 Read Latency: 0.01806502869942601 ms. Write Count: 351 Write Latency: 0.04547008547008547 ms. Pending Tasks: 0 Table: hints SSTable count: 1 Space used (live), bytes: 7446 Space used (total), bytes: 80062 SSTable Compression Ratio: 0.2651441528992549 Number of keys (estimate): 128 Memtable cell count: 1 Memtable data size, bytes: 1740 The hints are definitely not being stored. Robert On Dec 14, 2014, at 11:44 PM, Jens Rantil jens.ran...@tink.semailto:jens.ran...@tink.se wrote: Hi Robert , Maybe you need to flush your memtables to actually see the disk usage increase? This applies to both hosts. Cheers, Jens On Sun, Dec 14, 2014 at 3:52 PM, Robert Wille rwi...@fold3.commailto:rwi...@fold3.com wrote: I have a cluster with RF=3. If I shut down one node, add a bunch of data to the cluster, I don’t see a bunch of records added to system.hints. Also, du of /var/lib/cassandra/data/system/hints of the nodes that are up shows that hints aren’t being stored. When I start the down node, its data doesn’t grow until I run repair, which then takes a really long time because it is significantly out of date. Is there some magic setting I cannot find in the documentation to enable hinted handoff? I’m running 2.0.11. Any insights would be greatly appreciated. Thanks Robert
Re: Cassandra Maintenance Best practices
Hi Jonathan,QUORUM = (sum_of_replication_factors / 2) + 1, For us Quorum = (2/2) +1 = 2. Default CL is ONE and RF=2 with Two Nodes in the cluster.(I am little confused, what is my read CL and what is my WRITE CL?) So, does it mean that for every WRITE it will write in both the nodes? and For every READ, it will read from both nodes and give back to client? DOWNGRADERETRYPOLICY will downgrade the CL if a node is down? Regards Neha On Wed, Dec 10, 2014 at 1:00 PM, Jonathan Haddad j...@jonhaddad.com wrote: I did a presentation on diagnosing performance problems in production at the US Euro summits, in which I covered quite a few tools preventative measures you should know when running a production cluster. You may find it useful: http://rustyrazorblade.com/2014/09/cassandra-summit-recap-diagnosing-problems-in-production/ On ops center - I recommend it. It gives you a nice dashboard. I don't think it's completely comprehensive (but no tool really is) but it gets you 90% of the way there. It's a good idea to run repairs, especially if you're doing deletes or querying at CL=ONE. I assume you're not using quorum, because on RF=2 that's the same as CL=ALL. I recommend at least RF=3 because if you lose 1 server, you're on the edge of data loss. On Tue Dec 09 2014 at 7:19:32 PM Neha Trivedi nehajtriv...@gmail.com wrote: Hi, We have Two Node Cluster Configuration in production with RF=2. Which means that the data is written in both the clusters and it's running for about a month now and has good amount of data. Questions? 1. What are the best practices for maintenance? 2. Is OPScenter required to be installed or I can manage with nodetool utility? 3. Is is necessary to run repair weekly? thanks regards Neha
Comprehensive documentation on Cassandra Data modelling
Hi, I have been having a few exchanges with contributors to the project around what is possible with Cassandra and a common response that comes up when I describe functionality as broken or missing is that I am not modelling my data correctly. Unfortunately, I cannot seem to find comprehensive documentation on modelling with Cassandra. In particular, I am finding myself modelling by restriction rather than what I would like to do. Does such documentations exist? If not, is there any effort to create such documentation?The DataStax documentation on data modelling is far too weak to be meaningful. In particular, I am caught because: 1) I want to search on a specific column to make updates to it after further processing; ie I don't know its value on first insert 2) If I want to search on a column, it has to be part of the primary key3) If a column is part of the primary key, it cannot be edited so I have a circular dependency Thanks, Jason
Re: Cassandra Maintenance Best practices
CL quorum with RF2 is equivalent to ALL, writes will require acknowledgement from both nodes, and reads will be from both nodes. CL one will write to both replicas, but return success as soon as the first one responds, read will be from one node ( load balancing strategy determines which one). FWIW I've come around to dislike downgrading retry policy. I now feel like if I'm using downgrading, I'm effectively going to be using that downgraded policy most of the time under server stress, so in practice that reduced consistency is the effective consistency I'm asking for from my writes and reads. On Tue, Dec 16, 2014 at 10:50 AM, Neha Trivedi nehajtriv...@gmail.com wrote: Hi Jonathan,QUORUM = (sum_of_replication_factors / 2) + 1, For us Quorum = (2/2) +1 = 2. Default CL is ONE and RF=2 with Two Nodes in the cluster.(I am little confused, what is my read CL and what is my WRITE CL?) So, does it mean that for every WRITE it will write in both the nodes? and For every READ, it will read from both nodes and give back to client? DOWNGRADERETRYPOLICY will downgrade the CL if a node is down? Regards Neha On Wed, Dec 10, 2014 at 1:00 PM, Jonathan Haddad j...@jonhaddad.com wrote: I did a presentation on diagnosing performance problems in production at the US Euro summits, in which I covered quite a few tools preventative measures you should know when running a production cluster. You may find it useful: http://rustyrazorblade.com/2014/09/cassandra-summit-recap-diagnosing-problems-in-production/ On ops center - I recommend it. It gives you a nice dashboard. I don't think it's completely comprehensive (but no tool really is) but it gets you 90% of the way there. It's a good idea to run repairs, especially if you're doing deletes or querying at CL=ONE. I assume you're not using quorum, because on RF=2 that's the same as CL=ALL. I recommend at least RF=3 because if you lose 1 server, you're on the edge of data loss. On Tue Dec 09 2014 at 7:19:32 PM Neha Trivedi nehajtriv...@gmail.com wrote: Hi, We have Two Node Cluster Configuration in production with RF=2. Which means that the data is written in both the clusters and it's running for about a month now and has good amount of data. Questions? 1. What are the best practices for maintenance? 2. Is OPScenter required to be installed or I can manage with nodetool utility? 3. Is is necessary to run repair weekly? thanks regards Neha -- [image: datastax_logo.png] http://www.datastax.com/ Ryan Svihla Solution Architect [image: twitter.png] https://twitter.com/foundev [image: linkedin.png] http://www.linkedin.com/pub/ryan-svihla/12/621/727/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay.
Re: Cassandra Maintenance Best practices
Thanks Ryan. So, as Jonathan recommended, we should have RF=3 with Three nodes. So Quorum = 2 so, CL= 2 (or I need the CL to be set to two) and I will not need the downgrading retry policy, in case if my one node goes down. I can dynamically add a New node to my Cluster. Can I change my RF to 3, dynamically without affecting my nodes ? regards Neha On Tue, Dec 16, 2014 at 10:32 PM, Ryan Svihla rsvi...@datastax.com wrote: CL quorum with RF2 is equivalent to ALL, writes will require acknowledgement from both nodes, and reads will be from both nodes. CL one will write to both replicas, but return success as soon as the first one responds, read will be from one node ( load balancing strategy determines which one). FWIW I've come around to dislike downgrading retry policy. I now feel like if I'm using downgrading, I'm effectively going to be using that downgraded policy most of the time under server stress, so in practice that reduced consistency is the effective consistency I'm asking for from my writes and reads. On Tue, Dec 16, 2014 at 10:50 AM, Neha Trivedi nehajtriv...@gmail.com wrote: Hi Jonathan,QUORUM = (sum_of_replication_factors / 2) + 1, For us Quorum = (2/2) +1 = 2. Default CL is ONE and RF=2 with Two Nodes in the cluster.(I am little confused, what is my read CL and what is my WRITE CL?) So, does it mean that for every WRITE it will write in both the nodes? and For every READ, it will read from both nodes and give back to client? DOWNGRADERETRYPOLICY will downgrade the CL if a node is down? Regards Neha On Wed, Dec 10, 2014 at 1:00 PM, Jonathan Haddad j...@jonhaddad.com wrote: I did a presentation on diagnosing performance problems in production at the US Euro summits, in which I covered quite a few tools preventative measures you should know when running a production cluster. You may find it useful: http://rustyrazorblade.com/2014/09/cassandra-summit-recap-diagnosing-problems-in-production/ On ops center - I recommend it. It gives you a nice dashboard. I don't think it's completely comprehensive (but no tool really is) but it gets you 90% of the way there. It's a good idea to run repairs, especially if you're doing deletes or querying at CL=ONE. I assume you're not using quorum, because on RF=2 that's the same as CL=ALL. I recommend at least RF=3 because if you lose 1 server, you're on the edge of data loss. On Tue Dec 09 2014 at 7:19:32 PM Neha Trivedi nehajtriv...@gmail.com wrote: Hi, We have Two Node Cluster Configuration in production with RF=2. Which means that the data is written in both the clusters and it's running for about a month now and has good amount of data. Questions? 1. What are the best practices for maintenance? 2. Is OPScenter required to be installed or I can manage with nodetool utility? 3. Is is necessary to run repair weekly? thanks regards Neha -- [image: datastax_logo.png] http://www.datastax.com/ Ryan Svihla Solution Architect [image: twitter.png] https://twitter.com/foundev [image: linkedin.png] http://www.linkedin.com/pub/ryan-svihla/12/621/727/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay.
Re: Cassandra Maintenance Best practices
you'll have to run repair and that will involve some load and streaming, but this is a normal use case for cassandra..and your cluster should be sized load wise to allow repair, and bootstrapping of new nodes..otherwise when you're over whelmed you won't be able to add more nodes easily. If you need to reduce the cost of streaming to the existing cluster, just set streaming throughput on your existing nodes to a lower number like 50 or 25. On Tue, Dec 16, 2014 at 11:10 AM, Neha Trivedi nehajtriv...@gmail.com wrote: Thanks Ryan. So, as Jonathan recommended, we should have RF=3 with Three nodes. So Quorum = 2 so, CL= 2 (or I need the CL to be set to two) and I will not need the downgrading retry policy, in case if my one node goes down. I can dynamically add a New node to my Cluster. Can I change my RF to 3, dynamically without affecting my nodes ? regards Neha On Tue, Dec 16, 2014 at 10:32 PM, Ryan Svihla rsvi...@datastax.com wrote: CL quorum with RF2 is equivalent to ALL, writes will require acknowledgement from both nodes, and reads will be from both nodes. CL one will write to both replicas, but return success as soon as the first one responds, read will be from one node ( load balancing strategy determines which one). FWIW I've come around to dislike downgrading retry policy. I now feel like if I'm using downgrading, I'm effectively going to be using that downgraded policy most of the time under server stress, so in practice that reduced consistency is the effective consistency I'm asking for from my writes and reads. On Tue, Dec 16, 2014 at 10:50 AM, Neha Trivedi nehajtriv...@gmail.com wrote: Hi Jonathan,QUORUM = (sum_of_replication_factors / 2) + 1, For us Quorum = (2/2) +1 = 2. Default CL is ONE and RF=2 with Two Nodes in the cluster.(I am little confused, what is my read CL and what is my WRITE CL?) So, does it mean that for every WRITE it will write in both the nodes? and For every READ, it will read from both nodes and give back to client? DOWNGRADERETRYPOLICY will downgrade the CL if a node is down? Regards Neha On Wed, Dec 10, 2014 at 1:00 PM, Jonathan Haddad j...@jonhaddad.com wrote: I did a presentation on diagnosing performance problems in production at the US Euro summits, in which I covered quite a few tools preventative measures you should know when running a production cluster. You may find it useful: http://rustyrazorblade.com/2014/09/cassandra-summit-recap-diagnosing-problems-in-production/ On ops center - I recommend it. It gives you a nice dashboard. I don't think it's completely comprehensive (but no tool really is) but it gets you 90% of the way there. It's a good idea to run repairs, especially if you're doing deletes or querying at CL=ONE. I assume you're not using quorum, because on RF=2 that's the same as CL=ALL. I recommend at least RF=3 because if you lose 1 server, you're on the edge of data loss. On Tue Dec 09 2014 at 7:19:32 PM Neha Trivedi nehajtriv...@gmail.com wrote: Hi, We have Two Node Cluster Configuration in production with RF=2. Which means that the data is written in both the clusters and it's running for about a month now and has good amount of data. Questions? 1. What are the best practices for maintenance? 2. Is OPScenter required to be installed or I can manage with nodetool utility? 3. Is is necessary to run repair weekly? thanks regards Neha -- [image: datastax_logo.png] http://www.datastax.com/ Ryan Svihla Solution Architect [image: twitter.png] https://twitter.com/foundev [image: linkedin.png] http://www.linkedin.com/pub/ryan-svihla/12/621/727/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. -- [image: datastax_logo.png] http://www.datastax.com/ Ryan Svihla Solution Architect [image: twitter.png] https://twitter.com/foundev [image: linkedin.png] http://www.linkedin.com/pub/ryan-svihla/12/621/727/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay.
Re: Cassandra Maintenance Best practices
thanks Ryan.. We will get a new node and add it in the cluster. I will mail if I have any question regarding the same. On Tue, Dec 16, 2014 at 10:52 PM, Ryan Svihla rsvi...@datastax.com wrote: you'll have to run repair and that will involve some load and streaming, but this is a normal use case for cassandra..and your cluster should be sized load wise to allow repair, and bootstrapping of new nodes..otherwise when you're over whelmed you won't be able to add more nodes easily. If you need to reduce the cost of streaming to the existing cluster, just set streaming throughput on your existing nodes to a lower number like 50 or 25. On Tue, Dec 16, 2014 at 11:10 AM, Neha Trivedi nehajtriv...@gmail.com wrote: Thanks Ryan. So, as Jonathan recommended, we should have RF=3 with Three nodes. So Quorum = 2 so, CL= 2 (or I need the CL to be set to two) and I will not need the downgrading retry policy, in case if my one node goes down. I can dynamically add a New node to my Cluster. Can I change my RF to 3, dynamically without affecting my nodes ? regards Neha On Tue, Dec 16, 2014 at 10:32 PM, Ryan Svihla rsvi...@datastax.com wrote: CL quorum with RF2 is equivalent to ALL, writes will require acknowledgement from both nodes, and reads will be from both nodes. CL one will write to both replicas, but return success as soon as the first one responds, read will be from one node ( load balancing strategy determines which one). FWIW I've come around to dislike downgrading retry policy. I now feel like if I'm using downgrading, I'm effectively going to be using that downgraded policy most of the time under server stress, so in practice that reduced consistency is the effective consistency I'm asking for from my writes and reads. On Tue, Dec 16, 2014 at 10:50 AM, Neha Trivedi nehajtriv...@gmail.com wrote: Hi Jonathan,QUORUM = (sum_of_replication_factors / 2) + 1, For us Quorum = (2/2) +1 = 2. Default CL is ONE and RF=2 with Two Nodes in the cluster.(I am little confused, what is my read CL and what is my WRITE CL?) So, does it mean that for every WRITE it will write in both the nodes? and For every READ, it will read from both nodes and give back to client? DOWNGRADERETRYPOLICY will downgrade the CL if a node is down? Regards Neha On Wed, Dec 10, 2014 at 1:00 PM, Jonathan Haddad j...@jonhaddad.com wrote: I did a presentation on diagnosing performance problems in production at the US Euro summits, in which I covered quite a few tools preventative measures you should know when running a production cluster. You may find it useful: http://rustyrazorblade.com/2014/09/cassandra-summit-recap-diagnosing-problems-in-production/ On ops center - I recommend it. It gives you a nice dashboard. I don't think it's completely comprehensive (but no tool really is) but it gets you 90% of the way there. It's a good idea to run repairs, especially if you're doing deletes or querying at CL=ONE. I assume you're not using quorum, because on RF=2 that's the same as CL=ALL. I recommend at least RF=3 because if you lose 1 server, you're on the edge of data loss. On Tue Dec 09 2014 at 7:19:32 PM Neha Trivedi nehajtriv...@gmail.com wrote: Hi, We have Two Node Cluster Configuration in production with RF=2. Which means that the data is written in both the clusters and it's running for about a month now and has good amount of data. Questions? 1. What are the best practices for maintenance? 2. Is OPScenter required to be installed or I can manage with nodetool utility? 3. Is is necessary to run repair weekly? thanks regards Neha -- [image: datastax_logo.png] http://www.datastax.com/ Ryan Svihla Solution Architect [image: twitter.png] https://twitter.com/foundev [image: linkedin.png] http://www.linkedin.com/pub/ryan-svihla/12/621/727/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. -- [image: datastax_logo.png] http://www.datastax.com/ Ryan Svihla Solution Architect [image: twitter.png] https://twitter.com/foundev [image: linkedin.png] http://www.linkedin.com/pub/ryan-svihla/12/621/727/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay.
Re: Comprehensive documentation on Cassandra Data modelling
Data Modeling a distributed application could be a book unto itself. However, I will add, modeling by restriction is basically the entire thought process in Cassandra data modeling since it's a distributed hash table and a core aspect of that sort of application is you need to be able to quickly locate which server owns the data you want in the cluster (which is provided by the partition key). in specific response to your questions 1) as long as you know the primary key and the column name this just works. I'm not sure what the problem is 2) Yes, the partition key tells you which server owns the data, otherwise you'd have to scan all servers to find what you're asking for. 3) I'm not sure I understand this. To summarize, all modeling can be understood when you embrace the idea that : 1. Querying a single server will be faster than querying many servers 2. Multiple tables with the same data but with different partition keys is much easier to scale that a single table that you have to scan the whole cluster for your answer. If you accept this, you've basically got the key principle down...most other ideas are extensions of this, some nuance includes dealing with tombstones, partition size and order. and I can answer any more specifics. I've been meaning to write a series of blog posts on this, but as I stated, it's almost a book unto itself. Data modeling a distributed application requires a fundamental rethink of all the assumptions we've been taught for master/slave style databases. On Tue, Dec 16, 2014 at 10:46 AM, Jason Kania jason.ka...@ymail.com wrote: Hi, I have been having a few exchanges with contributors to the project around what is possible with Cassandra and a common response that comes up when I describe functionality as broken or missing is that I am not modelling my data correctly. Unfortunately, I cannot seem to find comprehensive documentation on modelling with Cassandra. In particular, I am finding myself modelling by restriction rather than what I would like to do. Does such documentations exist? If not, is there any effort to create such documentation?The DataStax documentation on data modelling is far too weak to be meaningful. In particular, I am caught because: 1) I want to search on a specific column to make updates to it after further processing; ie I don't know its value on first insert 2) If I want to search on a column, it has to be part of the primary key 3) If a column is part of the primary key, it cannot be edited so I have a circular dependency Thanks, Jason -- [image: datastax_logo.png] http://www.datastax.com/ Ryan Svihla Solution Architect [image: twitter.png] https://twitter.com/foundev [image: linkedin.png] http://www.linkedin.com/pub/ryan-svihla/12/621/727/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay.
Re: Defining DataSet.json for cassandra-unit testing
I'd ask the author of cassandra-unit. I've not personally used that project. On Tue, Dec 16, 2014 at 8:00 AM, Chamila Wijayarathna cdwijayarat...@gmail.com wrote: Hello all, I am trying to test my application using cassandra-unit with following schema and data given below. CREATE TABLE corpus.bigram_time_category_ordered_frequency ( id bigint, word1 varchar, word2 varchar, year int, category varchar, frequency int, PRIMARY KEY((year, category),frequency,word1,word2)); year | category | frequency | word1| word2 | id --+--+---+--+-+--- 2014 |N | 1 |සියළුම | යුද්ධ | 664 2014 |N | 1 |එච් | කාණ්ඩය | 12526 2014 |N | 1 |ගජබා | සුපර්ක්රොස් | 25779 2014 |N | 1 | බී| කාණ්ඩය | 12505 Since this has a compound primary key, I am not clear with how to define dataset.json [1] for this CF. Can somebody help me on how to do that? Thank You! 1. https://github.com/jsevellec/cassandra-unit/wiki/What-can-you-set-into-a-dataSet -- *Chamila Dilshan Wijayarathna,* SMIEEE, SMIESL, Undergraduate, Department of Computer Science and Engineering, University of Moratuwa. -- [image: datastax_logo.png] http://www.datastax.com/ Ryan Svihla Solution Architect [image: twitter.png] https://twitter.com/foundev [image: linkedin.png] http://www.linkedin.com/pub/ryan-svihla/12/621/727/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay.
Re: Changing replication factor of Cassandra cluster
Repair's performance is going to vary heavily by a large number of factors, hours for 1 node to finish is within range of what I see in the wild, again there are so many factors it's impossible to speculate on if that is good or bad for your cluster. Factors that matter include: 1. speed of disk io 2. amount of ram and cpu on each node 3. network interface speed 4. is this multidc or not 5. are vnodes enabled or not 6. what are the jvm tunings 7. compaction settings 8. current load on the cluster 9. streaming settings Suffice it to say to improve repair performance is a full on tuning exercise, note you're current operation is going to be worse than tradtional repair, as your streaming copies of data around and not just doing normal merkel tree work. Restoring from backup to a new cluster (including how to handle token ranges) is discussed in detail here http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_snapshot_restore_new_cluster.html On Mon, Dec 15, 2014 at 4:14 PM, Pranay Agarwal agarwalpran...@gmail.com wrote: Hi All, I have 20 nodes cassandra cluster with 500gb of data and replication factor of 1. I increased the replication factor to 3 and ran nodetool repair on each node one by one as the docs says. But it takes hours for 1 node to finish repair. Is that normal or am I doing something wrong? Also, I took backup of cassandra data on each node. How do I restore the graph in a new cluster of nodes using the backup? Do I have to have the tokens range backed up as well? -Pranay -- [image: datastax_logo.png] http://www.datastax.com/ Ryan Svihla Solution Architect [image: twitter.png] https://twitter.com/foundev [image: linkedin.png] http://www.linkedin.com/pub/ryan-svihla/12/621/727/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay.
Re: Comprehensive documentation on Cassandra Data modelling
Ryan, Thanks for the response. It offers a bit more clarity. I think a series of blog posts with good real world examples would go a long way to increasing usability of Cassandra. Right now I find the process like going through a mine field because I only discover what is not possible after trying something that I would find logical and failing. For my specific questions, the problem is that since searching is only possible on columns in the primary key and the primary key cannot be updated, I am not sure what the appropriate solution is when data exists that needs to be searched and then updated. What is the preferrable approach to this? Is the expectation to maintain a series of tables, one for each stage of data manipulation with its own primary key? Thanks, Jason From: Ryan Svihla rsvi...@datastax.com To: user@cassandra.apache.org Sent: Tuesday, December 16, 2014 12:36 PM Subject: Re: Comprehensive documentation on Cassandra Data modelling Data Modeling a distributed application could be a book unto itself. However, I will add, modeling by restriction is basically the entire thought process in Cassandra data modeling since it's a distributed hash table and a core aspect of that sort of application is you need to be able to quickly locate which server owns the data you want in the cluster (which is provided by the partition key). in specific response to your questions 1) as long as you know the primary key and the column name this just works. I'm not sure what the problem is 2) Yes, the partition key tells you which server owns the data, otherwise you'd have to scan all servers to find what you're asking for. 3) I'm not sure I understand this. To summarize, all modeling can be understood when you embrace the idea that : - Querying a single server will be faster than querying many servers - Multiple tables with the same data but with different partition keys is much easier to scale that a single table that you have to scan the whole cluster for your answer. If you accept this, you've basically got the key principle down...most other ideas are extensions of this, some nuance includes dealing with tombstones, partition size and order. and I can answer any more specifics. I've been meaning to write a series of blog posts on this, but as I stated, it's almost a book unto itself. Data modeling a distributed application requires a fundamental rethink of all the assumptions we've been taught for master/slave style databases. On Tue, Dec 16, 2014 at 10:46 AM, Jason Kania jason.ka...@ymail.com wrote: Hi, I have been having a few exchanges with contributors to the project around what is possible with Cassandra and a common response that comes up when I describe functionality as broken or missing is that I am not modelling my data correctly. Unfortunately, I cannot seem to find comprehensive documentation on modelling with Cassandra. In particular, I am finding myself modelling by restriction rather than what I would like to do. Does such documentations exist? If not, is there any effort to create such documentation?The DataStax documentation on data modelling is far too weak to be meaningful. In particular, I am caught because: 1) I want to search on a specific column to make updates to it after further processing; ie I don't know its value on first insert 2) If I want to search on a column, it has to be part of the primary key3) If a column is part of the primary key, it cannot be edited so I have a circular dependency Thanks, Jason -- Ryan SvihlaSolution Architect DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay.
Re: Comprehensive documentation on Cassandra Data modelling
There is a lot of stuff out there and the best thing you can do today is watch Patrick McFadden's series. This is was what I used before I started at DataStax. Planet Cassandra has a data modeling playlist of videos you can watch https://www.youtube.com/playlist?list=PLqcm6qE9lgKJoSWKYWHWhrVupRbS8mmDA including the McFadden videos I mentioned. Finally, you hit a key point, a series of tables is the normal approach to most data modeling, you model your tables around the queries you need, with the exception of the nuance I referred to in the last email, this one concept will get you through 80% of use cases fine. On Tue, Dec 16, 2014 at 12:01 PM, Jason Kania jason.ka...@ymail.com wrote: Ryan, Thanks for the response. It offers a bit more clarity. I think a series of blog posts with good real world examples would go a long way to increasing usability of Cassandra. Right now I find the process like going through a mine field because I only discover what is not possible after trying something that I would find logical and failing. For my specific questions, the problem is that since searching is only possible on columns in the primary key and the primary key cannot be updated, I am not sure what the appropriate solution is when data exists that needs to be searched and then updated. What is the preferrable approach to this? Is the expectation to maintain a series of tables, one for each stage of data manipulation with its own primary key? Thanks, Jason -- *From:* Ryan Svihla rsvi...@datastax.com *To:* user@cassandra.apache.org *Sent:* Tuesday, December 16, 2014 12:36 PM *Subject:* Re: Comprehensive documentation on Cassandra Data modelling Data Modeling a distributed application could be a book unto itself. However, I will add, modeling by restriction is basically the entire thought process in Cassandra data modeling since it's a distributed hash table and a core aspect of that sort of application is you need to be able to quickly locate which server owns the data you want in the cluster (which is provided by the partition key). in specific response to your questions 1) as long as you know the primary key and the column name this just works. I'm not sure what the problem is 2) Yes, the partition key tells you which server owns the data, otherwise you'd have to scan all servers to find what you're asking for. 3) I'm not sure I understand this. To summarize, all modeling can be understood when you embrace the idea that : 1. Querying a single server will be faster than querying many servers 2. Multiple tables with the same data but with different partition keys is much easier to scale that a single table that you have to scan the whole cluster for your answer. If you accept this, you've basically got the key principle down...most other ideas are extensions of this, some nuance includes dealing with tombstones, partition size and order. and I can answer any more specifics. I've been meaning to write a series of blog posts on this, but as I stated, it's almost a book unto itself. Data modeling a distributed application requires a fundamental rethink of all the assumptions we've been taught for master/slave style databases. On Tue, Dec 16, 2014 at 10:46 AM, Jason Kania jason.ka...@ymail.com wrote: Hi, I have been having a few exchanges with contributors to the project around what is possible with Cassandra and a common response that comes up when I describe functionality as broken or missing is that I am not modelling my data correctly. Unfortunately, I cannot seem to find comprehensive documentation on modelling with Cassandra. In particular, I am finding myself modelling by restriction rather than what I would like to do. Does such documentations exist? If not, is there any effort to create such documentation?The DataStax documentation on data modelling is far too weak to be meaningful. In particular, I am caught because: 1) I want to search on a specific column to make updates to it after further processing; ie I don't know its value on first insert 2) If I want to search on a column, it has to be part of the primary key 3) If a column is part of the primary key, it cannot be edited so I have a circular dependency Thanks, Jason -- [image: datastax_logo.png] http://www.datastax.com/ Ryan Svihla Solution Architect [image: twitter.png] https://twitter.com/foundev [image: linkedin.png] http://www.linkedin.com/pub/ryan-svihla/12/621/727/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. --
Re: does consistency=ALL for deletes obviate the need for tombstones?
I was speculating. From the responses above, it now appears to me that tombstones serve (at least) 2 distinct roles: 1. When reading within a single cassandra instance, they mark a new version of a value (that value being deleted). Without this, the prior version would be the most recent and so reads would still return the last value even after it was deleted. 2. They can resolve discrepancies when a client read receives conflicting answers from Cassandra nodes (e.g. where one of the nodes is out of date because it never saw the delete command). So in the above I was only referring to #2, without realizing the role they play in #1. - Ian On Tue, Dec 16, 2014 at 11:12 AM, Jack Krupansky j...@basetechnology.com wrote: When you say “no need for tombstones”, did you actually read that somewhere or were you just speculating? If the former, where exactly? -- Jack Krupansky *From:* Ian Rose ianr...@fullstory.com *Sent:* Tuesday, December 16, 2014 10:22 AM *To:* user user@cassandra.apache.org *Subject:* does consistency=ALL for deletes obviate the need for tombstones? Howdy all, Our use of cassandra unfortunately makes use of lots of deletes. Yes, I know that C* is not well suited to this kind of workload, but that's where we are, and before I go looking for an entirely new data layer I would rather explore whether C* could be tuned to work well for us. However, deletions are never driven by users in our app - deletions always occur by backend processes to clean up data after it has been processed, and thus they do not need to be 100% available. So this made me think, what if I did the following? - gc_grace_seconds = 0, which ensures that tombstones are never created - replication factor = 3 - for writes that are inserts, consistency = QUORUM, which ensures that writes can proceed even if 1 replica is slow/down - for deletes, consistency = ALL, which ensures that when we delete a record it disappears entirely (no need for tombstones) - for reads, consistency = QUORUM Also, I should clarify that our data essentially append only, so I don't need to worry about inconsistencies created by partial updates (e.g. value gets changed on one machine but not another). Sometimes there will be duplicate writes, but I think that should be fine since the value is always identical. Any red flags with this approach? Has anyone tried it and have experiences to share? Also, I *think* that this means that I don't need to run repairs, which from an ops perspective is great. Thanks, as always, - Ian
100% CPU utilization, ParNew and never completing compactions
I have a three node cluster that has been sitting at a load of 4 (for each node), 100% CPI utilization (although 92% nice) for that last 12 hours, ever since some significant writes finished. I'm trying to determine what tuning I should be doing to get it out of this state. The debug log is just an endless series of: DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is 8000634880 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java (line 118) GC for ParNew: 165 ms for 10 collections, 4440011176 used; max is 8000634880 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 GCInspector.java (line 118) GC for ParNew: 135 ms for 8 collections, 4402220568 used; max is 8000634880 iostat shows virtually no I/O. Compaction may enter into this, but i don't really know what to make of compaction stats since they never change: [root@cassandra-37919c3a ~]# nodetool compactionstats pending tasks: 10 compaction typekeyspace table completed total unit progress Compaction mediamedia_tracks_raw 271651482 563615497 bytes48.20% Compaction mediamedia_tracks_raw30308910 21676695677 bytes 0.14% Compaction mediamedia_tracks_raw 1198384080 1815603161 bytes66.00% Active compaction remaining time : 0h22m24s 5 minutes later: [root@cassandra-37919c3a ~]# nodetool compactionstats pending tasks: 9 compaction typekeyspace table completed total unit progress Compaction mediamedia_tracks_raw 271651482 563615497 bytes48.20% Compaction mediamedia_tracks_raw30308910 21676695677 bytes 0.14% Compaction mediamedia_tracks_raw 1198384080 1815603161 bytes66.00% Active compaction remaining time : 0h22m24s Sure the pending tasks went down by one, but the rest is identical. media_tracks_raw likely has a bunch of tombstones (can't figure out how to get stats on that). Is this behavior something that indicates that i need more Heap, larger new generation? Should I be manually running compaction on tables with lots of tombstones? Any suggestions or places to educate myself better on performance tuning would be appreciated. arne
Re: 100% CPU utilization, ParNew and never completing compactions
Hello, What version of Cassandra are you running? If it's 2.0, we recently experienced something similar with 8447 [1], which 8485 [2] should hopefully resolve. Please note that 8447 is not related to tombstones. Tombstone processing can put a lot of pressure on the heap as well. Why do you think you have a lot of tombstones in that one particular table? [1] https://issues.apache.org/jira/browse/CASSANDRA-8447 [2] https://issues.apache.org/jira/browse/CASSANDRA-8485 Jonathan [image: datastax_logo.png] Jonathan Lacefield Solution Architect | (404) 822 3487 | jlacefi...@datastax.com [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax https://github.com/datastax/ On Tue, Dec 16, 2014 at 2:04 PM, Arne Claassen a...@emotient.com wrote: I have a three node cluster that has been sitting at a load of 4 (for each node), 100% CPI utilization (although 92% nice) for that last 12 hours, ever since some significant writes finished. I'm trying to determine what tuning I should be doing to get it out of this state. The debug log is just an endless series of: DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is 8000634880 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java (line 118) GC for ParNew: 165 ms for 10 collections, 4440011176 used; max is 8000634880 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 GCInspector.java (line 118) GC for ParNew: 135 ms for 8 collections, 4402220568 used; max is 8000634880 iostat shows virtually no I/O. Compaction may enter into this, but i don't really know what to make of compaction stats since they never change: [root@cassandra-37919c3a ~]# nodetool compactionstats pending tasks: 10 compaction typekeyspace table completed total unit progress Compaction mediamedia_tracks_raw 271651482 563615497 bytes48.20% Compaction mediamedia_tracks_raw30308910 21676695677 bytes 0.14% Compaction mediamedia_tracks_raw 1198384080 1815603161 bytes66.00% Active compaction remaining time : 0h22m24s 5 minutes later: [root@cassandra-37919c3a ~]# nodetool compactionstats pending tasks: 9 compaction typekeyspace table completed total unit progress Compaction mediamedia_tracks_raw 271651482 563615497 bytes48.20% Compaction mediamedia_tracks_raw30308910 21676695677 bytes 0.14% Compaction mediamedia_tracks_raw 1198384080 1815603161 bytes66.00% Active compaction remaining time : 0h22m24s Sure the pending tasks went down by one, but the rest is identical. media_tracks_raw likely has a bunch of tombstones (can't figure out how to get stats on that). Is this behavior something that indicates that i need more Heap, larger new generation? Should I be manually running compaction on tables with lots of tombstones? Any suggestions or places to educate myself better on performance tuning would be appreciated. arne
Re: 100% CPU utilization, ParNew and never completing compactions
What's heap usage at? On Tue, Dec 16, 2014 at 1:04 PM, Arne Claassen a...@emotient.com wrote: I have a three node cluster that has been sitting at a load of 4 (for each node), 100% CPI utilization (although 92% nice) for that last 12 hours, ever since some significant writes finished. I'm trying to determine what tuning I should be doing to get it out of this state. The debug log is just an endless series of: DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is 8000634880 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java (line 118) GC for ParNew: 165 ms for 10 collections, 4440011176 used; max is 8000634880 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 GCInspector.java (line 118) GC for ParNew: 135 ms for 8 collections, 4402220568 used; max is 8000634880 iostat shows virtually no I/O. Compaction may enter into this, but i don't really know what to make of compaction stats since they never change: [root@cassandra-37919c3a ~]# nodetool compactionstats pending tasks: 10 compaction typekeyspace table completed total unit progress Compaction mediamedia_tracks_raw 271651482 563615497 bytes48.20% Compaction mediamedia_tracks_raw30308910 21676695677 bytes 0.14% Compaction mediamedia_tracks_raw 1198384080 1815603161 bytes66.00% Active compaction remaining time : 0h22m24s 5 minutes later: [root@cassandra-37919c3a ~]# nodetool compactionstats pending tasks: 9 compaction typekeyspace table completed total unit progress Compaction mediamedia_tracks_raw 271651482 563615497 bytes48.20% Compaction mediamedia_tracks_raw30308910 21676695677 bytes 0.14% Compaction mediamedia_tracks_raw 1198384080 1815603161 bytes66.00% Active compaction remaining time : 0h22m24s Sure the pending tasks went down by one, but the rest is identical. media_tracks_raw likely has a bunch of tombstones (can't figure out how to get stats on that). Is this behavior something that indicates that i need more Heap, larger new generation? Should I be manually running compaction on tables with lots of tombstones? Any suggestions or places to educate myself better on performance tuning would be appreciated. arne -- [image: datastax_logo.png] http://www.datastax.com/ Ryan Svihla Solution Architect [image: twitter.png] https://twitter.com/foundev [image: linkedin.png] http://www.linkedin.com/pub/ryan-svihla/12/621/727/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay.
Re: 100% CPU utilization, ParNew and never completing compactions
I'm running 2.0.10. The data is all time series data and as we change our pipeline, we've been periodically been reprocessing the data sources, which causes each time series to be overwritten, i.e. every row per partition key is deleted and re-written, so I assume i've been collecting a bunch of tombstones. Also, the presence of the ever present and never completing compaction types, i assumed were an artifact of tombstoning, but i fully admit to conjecture based on about ~20 blog posts and stackoverflow questions i've surveyed. I doubled the Heap on one node and it changed nothing regarding the load or the ParNew log statements. New Generation Usage is 50%, Eden itself is 56%. Anything else i should look at and report, let me know. On Tue, Dec 16, 2014 at 11:14 AM, Jonathan Lacefield jlacefi...@datastax.com wrote: Hello, What version of Cassandra are you running? If it's 2.0, we recently experienced something similar with 8447 [1], which 8485 [2] should hopefully resolve. Please note that 8447 is not related to tombstones. Tombstone processing can put a lot of pressure on the heap as well. Why do you think you have a lot of tombstones in that one particular table? [1] https://issues.apache.org/jira/browse/CASSANDRA-8447 [2] https://issues.apache.org/jira/browse/CASSANDRA-8485 Jonathan [image: datastax_logo.png] Jonathan Lacefield Solution Architect | (404) 822 3487 | jlacefi...@datastax.com [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax https://github.com/datastax/ On Tue, Dec 16, 2014 at 2:04 PM, Arne Claassen a...@emotient.com wrote: I have a three node cluster that has been sitting at a load of 4 (for each node), 100% CPI utilization (although 92% nice) for that last 12 hours, ever since some significant writes finished. I'm trying to determine what tuning I should be doing to get it out of this state. The debug log is just an endless series of: DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is 8000634880 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java (line 118) GC for ParNew: 165 ms for 10 collections, 4440011176 used; max is 8000634880 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 GCInspector.java (line 118) GC for ParNew: 135 ms for 8 collections, 4402220568 used; max is 8000634880 iostat shows virtually no I/O. Compaction may enter into this, but i don't really know what to make of compaction stats since they never change: [root@cassandra-37919c3a ~]# nodetool compactionstats pending tasks: 10 compaction typekeyspace table completed total unit progress Compaction mediamedia_tracks_raw 271651482 563615497 bytes48.20% Compaction mediamedia_tracks_raw30308910 21676695677 bytes 0.14% Compaction mediamedia_tracks_raw 1198384080 1815603161 bytes66.00% Active compaction remaining time : 0h22m24s 5 minutes later: [root@cassandra-37919c3a ~]# nodetool compactionstats pending tasks: 9 compaction typekeyspace table completed total unit progress Compaction mediamedia_tracks_raw 271651482 563615497 bytes48.20% Compaction mediamedia_tracks_raw30308910 21676695677 bytes 0.14% Compaction mediamedia_tracks_raw 1198384080 1815603161 bytes66.00% Active compaction remaining time : 0h22m24s Sure the pending tasks went down by one, but the rest is identical. media_tracks_raw likely has a bunch of tombstones (can't figure out how to get stats on that). Is this behavior something that indicates that i need more Heap, larger new generation? Should I be manually running compaction on tables with lots of tombstones? Any suggestions or places to educate myself better on performance tuning would be appreciated. arne
Re: 100% CPU utilization, ParNew and never completing compactions
What's CPU, RAM, Storage layer, and data density per node? Exact heap settings would be nice. In the logs look for TombstoneOverflowingException On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen a...@emotient.com wrote: I'm running 2.0.10. The data is all time series data and as we change our pipeline, we've been periodically been reprocessing the data sources, which causes each time series to be overwritten, i.e. every row per partition key is deleted and re-written, so I assume i've been collecting a bunch of tombstones. Also, the presence of the ever present and never completing compaction types, i assumed were an artifact of tombstoning, but i fully admit to conjecture based on about ~20 blog posts and stackoverflow questions i've surveyed. I doubled the Heap on one node and it changed nothing regarding the load or the ParNew log statements. New Generation Usage is 50%, Eden itself is 56%. Anything else i should look at and report, let me know. On Tue, Dec 16, 2014 at 11:14 AM, Jonathan Lacefield jlacefi...@datastax.com wrote: Hello, What version of Cassandra are you running? If it's 2.0, we recently experienced something similar with 8447 [1], which 8485 [2] should hopefully resolve. Please note that 8447 is not related to tombstones. Tombstone processing can put a lot of pressure on the heap as well. Why do you think you have a lot of tombstones in that one particular table? [1] https://issues.apache.org/jira/browse/CASSANDRA-8447 [2] https://issues.apache.org/jira/browse/CASSANDRA-8485 Jonathan [image: datastax_logo.png] Jonathan Lacefield Solution Architect | (404) 822 3487 | jlacefi...@datastax.com [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax https://github.com/datastax/ On Tue, Dec 16, 2014 at 2:04 PM, Arne Claassen a...@emotient.com wrote: I have a three node cluster that has been sitting at a load of 4 (for each node), 100% CPI utilization (although 92% nice) for that last 12 hours, ever since some significant writes finished. I'm trying to determine what tuning I should be doing to get it out of this state. The debug log is just an endless series of: DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is 8000634880 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java (line 118) GC for ParNew: 165 ms for 10 collections, 4440011176 used; max is 8000634880 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 GCInspector.java (line 118) GC for ParNew: 135 ms for 8 collections, 4402220568 used; max is 8000634880 iostat shows virtually no I/O. Compaction may enter into this, but i don't really know what to make of compaction stats since they never change: [root@cassandra-37919c3a ~]# nodetool compactionstats pending tasks: 10 compaction typekeyspace table completed total unit progress Compaction mediamedia_tracks_raw 271651482 563615497 bytes48.20% Compaction mediamedia_tracks_raw 30308910 21676695677 bytes 0.14% Compaction mediamedia_tracks_raw 1198384080 1815603161 bytes66.00% Active compaction remaining time : 0h22m24s 5 minutes later: [root@cassandra-37919c3a ~]# nodetool compactionstats pending tasks: 9 compaction typekeyspace table completed total unit progress Compaction mediamedia_tracks_raw 271651482 563615497 bytes48.20% Compaction mediamedia_tracks_raw 30308910 21676695677 bytes 0.14% Compaction mediamedia_tracks_raw 1198384080 1815603161 bytes66.00% Active compaction remaining time : 0h22m24s Sure the pending tasks went down by one, but the rest is identical. media_tracks_raw likely has a bunch of tombstones (can't figure out how to get stats on that). Is this behavior something that indicates that i need more Heap, larger new generation? Should I be manually running compaction on tables with lots of tombstones? Any suggestions or places to educate myself better on performance tuning would be appreciated. arne -- [image: datastax_logo.png] http://www.datastax.com/ Ryan Svihla Solution Architect [image: twitter.png] https://twitter.com/foundev [image: linkedin.png] http://www.linkedin.com/pub/ryan-svihla/12/621/727/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size.
Re: 100% CPU utilization, ParNew and never completing compactions
AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because we might go c3.2xlarge instead if CPU is more important than RAM Storage is optimized EBS SSD (but iostat shows no real IO going on) Each node only has about 10GB with ownership of 67%, 64.7% 68.3%. The node on which I set the Heap to 10GB from 6GB the utlilization has dropped to 46%nice now, but the ParNew log messages still continue at the same pace. I'm gonna up the HEAP to 20GB for a bit, see if that brings that nice CPU further down. No TombstoneOverflowingExceptions. On Tue, Dec 16, 2014 at 11:50 AM, Ryan Svihla rsvi...@datastax.com wrote: What's CPU, RAM, Storage layer, and data density per node? Exact heap settings would be nice. In the logs look for TombstoneOverflowingException On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen a...@emotient.com wrote: I'm running 2.0.10. The data is all time series data and as we change our pipeline, we've been periodically been reprocessing the data sources, which causes each time series to be overwritten, i.e. every row per partition key is deleted and re-written, so I assume i've been collecting a bunch of tombstones. Also, the presence of the ever present and never completing compaction types, i assumed were an artifact of tombstoning, but i fully admit to conjecture based on about ~20 blog posts and stackoverflow questions i've surveyed. I doubled the Heap on one node and it changed nothing regarding the load or the ParNew log statements. New Generation Usage is 50%, Eden itself is 56%. Anything else i should look at and report, let me know. On Tue, Dec 16, 2014 at 11:14 AM, Jonathan Lacefield jlacefi...@datastax.com wrote: Hello, What version of Cassandra are you running? If it's 2.0, we recently experienced something similar with 8447 [1], which 8485 [2] should hopefully resolve. Please note that 8447 is not related to tombstones. Tombstone processing can put a lot of pressure on the heap as well. Why do you think you have a lot of tombstones in that one particular table? [1] https://issues.apache.org/jira/browse/CASSANDRA-8447 [2] https://issues.apache.org/jira/browse/CASSANDRA-8485 Jonathan [image: datastax_logo.png] Jonathan Lacefield Solution Architect | (404) 822 3487 | jlacefi...@datastax.com [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax https://github.com/datastax/ On Tue, Dec 16, 2014 at 2:04 PM, Arne Claassen a...@emotient.com wrote: I have a three node cluster that has been sitting at a load of 4 (for each node), 100% CPI utilization (although 92% nice) for that last 12 hours, ever since some significant writes finished. I'm trying to determine what tuning I should be doing to get it out of this state. The debug log is just an endless series of: DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is 8000634880 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java (line 118) GC for ParNew: 165 ms for 10 collections, 4440011176 used; max is 8000634880 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 GCInspector.java (line 118) GC for ParNew: 135 ms for 8 collections, 4402220568 used; max is 8000634880 iostat shows virtually no I/O. Compaction may enter into this, but i don't really know what to make of compaction stats since they never change: [root@cassandra-37919c3a ~]# nodetool compactionstats pending tasks: 10 compaction typekeyspace table completed total unit progress Compaction mediamedia_tracks_raw 271651482 563615497 bytes48.20% Compaction mediamedia_tracks_raw 30308910 21676695677 bytes 0.14% Compaction mediamedia_tracks_raw 1198384080 1815603161 bytes66.00% Active compaction remaining time : 0h22m24s 5 minutes later: [root@cassandra-37919c3a ~]# nodetool compactionstats pending tasks: 9 compaction typekeyspace table completed total unit progress Compaction mediamedia_tracks_raw 271651482 563615497 bytes48.20% Compaction mediamedia_tracks_raw 30308910 21676695677 bytes 0.14% Compaction mediamedia_tracks_raw 1198384080 1815603161 bytes66.00% Active compaction remaining time : 0h22m24s Sure the pending tasks went down by one, but the rest is identical. media_tracks_raw likely has a bunch of tombstones (can't figure out how to get stats on that). Is this behavior something that indicates that i need more Heap, larger new
Re: 100% CPU utilization, ParNew and never completing compactions
Sorry, I meant 15GB heap on the one machine that has less nice CPU% now. The others are 6GB On Tue, Dec 16, 2014 at 12:50 PM, Arne Claassen a...@emotient.com wrote: AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because we might go c3.2xlarge instead if CPU is more important than RAM Storage is optimized EBS SSD (but iostat shows no real IO going on) Each node only has about 10GB with ownership of 67%, 64.7% 68.3%. The node on which I set the Heap to 10GB from 6GB the utlilization has dropped to 46%nice now, but the ParNew log messages still continue at the same pace. I'm gonna up the HEAP to 20GB for a bit, see if that brings that nice CPU further down. No TombstoneOverflowingExceptions. On Tue, Dec 16, 2014 at 11:50 AM, Ryan Svihla rsvi...@datastax.com wrote: What's CPU, RAM, Storage layer, and data density per node? Exact heap settings would be nice. In the logs look for TombstoneOverflowingException On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen a...@emotient.com wrote: I'm running 2.0.10. The data is all time series data and as we change our pipeline, we've been periodically been reprocessing the data sources, which causes each time series to be overwritten, i.e. every row per partition key is deleted and re-written, so I assume i've been collecting a bunch of tombstones. Also, the presence of the ever present and never completing compaction types, i assumed were an artifact of tombstoning, but i fully admit to conjecture based on about ~20 blog posts and stackoverflow questions i've surveyed. I doubled the Heap on one node and it changed nothing regarding the load or the ParNew log statements. New Generation Usage is 50%, Eden itself is 56%. Anything else i should look at and report, let me know. On Tue, Dec 16, 2014 at 11:14 AM, Jonathan Lacefield jlacefi...@datastax.com wrote: Hello, What version of Cassandra are you running? If it's 2.0, we recently experienced something similar with 8447 [1], which 8485 [2] should hopefully resolve. Please note that 8447 is not related to tombstones. Tombstone processing can put a lot of pressure on the heap as well. Why do you think you have a lot of tombstones in that one particular table? [1] https://issues.apache.org/jira/browse/CASSANDRA-8447 [2] https://issues.apache.org/jira/browse/CASSANDRA-8485 Jonathan [image: datastax_logo.png] Jonathan Lacefield Solution Architect | (404) 822 3487 | jlacefi...@datastax.com [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax https://github.com/datastax/ On Tue, Dec 16, 2014 at 2:04 PM, Arne Claassen a...@emotient.com wrote: I have a three node cluster that has been sitting at a load of 4 (for each node), 100% CPI utilization (although 92% nice) for that last 12 hours, ever since some significant writes finished. I'm trying to determine what tuning I should be doing to get it out of this state. The debug log is just an endless series of: DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is 8000634880 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java (line 118) GC for ParNew: 165 ms for 10 collections, 4440011176 used; max is 8000634880 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 GCInspector.java (line 118) GC for ParNew: 135 ms for 8 collections, 4402220568 used; max is 8000634880 iostat shows virtually no I/O. Compaction may enter into this, but i don't really know what to make of compaction stats since they never change: [root@cassandra-37919c3a ~]# nodetool compactionstats pending tasks: 10 compaction typekeyspace table completed total unit progress Compaction mediamedia_tracks_raw 271651482 563615497 bytes48.20% Compaction mediamedia_tracks_raw 30308910 21676695677 bytes 0.14% Compaction mediamedia_tracks_raw 1198384080 1815603161 bytes66.00% Active compaction remaining time : 0h22m24s 5 minutes later: [root@cassandra-37919c3a ~]# nodetool compactionstats pending tasks: 9 compaction typekeyspace table completed total unit progress Compaction mediamedia_tracks_raw 271651482 563615497 bytes48.20% Compaction mediamedia_tracks_raw 30308910 21676695677 bytes 0.14% Compaction mediamedia_tracks_raw 1198384080 1815603161 bytes66.00% Active compaction remaining time : 0h22m24s Sure the pending tasks went down by one, but the rest is identical.
Re: 100% CPU utilization, ParNew and never completing compactions
Changed the 15GB node to 25GB heap and the nice CPU is down to ~20% now. Checked my dev cluster to see if the ParNew log entries are just par for the course, but not seeing them there. However, both have the following every 30 seconds: DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,898 BatchlogManager.java (line 165) Started replayAllFailedBatches DEBUG [MemtablePostFlusher:1] 2014-12-16 21:00:44,899 ColumnFamilyStore.java (line 866) forceFlush requested but everything is clean in batchlog DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,899 BatchlogManager.java (line 200) Finished replayAllFailedBatches Is that just routine scheduled house-keeping or a sign of something else? On Tue, Dec 16, 2014 at 12:52 PM, Arne Claassen a...@emotient.com wrote: Sorry, I meant 15GB heap on the one machine that has less nice CPU% now. The others are 6GB On Tue, Dec 16, 2014 at 12:50 PM, Arne Claassen a...@emotient.com wrote: AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because we might go c3.2xlarge instead if CPU is more important than RAM Storage is optimized EBS SSD (but iostat shows no real IO going on) Each node only has about 10GB with ownership of 67%, 64.7% 68.3%. The node on which I set the Heap to 10GB from 6GB the utlilization has dropped to 46%nice now, but the ParNew log messages still continue at the same pace. I'm gonna up the HEAP to 20GB for a bit, see if that brings that nice CPU further down. No TombstoneOverflowingExceptions. On Tue, Dec 16, 2014 at 11:50 AM, Ryan Svihla rsvi...@datastax.com wrote: What's CPU, RAM, Storage layer, and data density per node? Exact heap settings would be nice. In the logs look for TombstoneOverflowingException On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen a...@emotient.com wrote: I'm running 2.0.10. The data is all time series data and as we change our pipeline, we've been periodically been reprocessing the data sources, which causes each time series to be overwritten, i.e. every row per partition key is deleted and re-written, so I assume i've been collecting a bunch of tombstones. Also, the presence of the ever present and never completing compaction types, i assumed were an artifact of tombstoning, but i fully admit to conjecture based on about ~20 blog posts and stackoverflow questions i've surveyed. I doubled the Heap on one node and it changed nothing regarding the load or the ParNew log statements. New Generation Usage is 50%, Eden itself is 56%. Anything else i should look at and report, let me know. On Tue, Dec 16, 2014 at 11:14 AM, Jonathan Lacefield jlacefi...@datastax.com wrote: Hello, What version of Cassandra are you running? If it's 2.0, we recently experienced something similar with 8447 [1], which 8485 [2] should hopefully resolve. Please note that 8447 is not related to tombstones. Tombstone processing can put a lot of pressure on the heap as well. Why do you think you have a lot of tombstones in that one particular table? [1] https://issues.apache.org/jira/browse/CASSANDRA-8447 [2] https://issues.apache.org/jira/browse/CASSANDRA-8485 Jonathan [image: datastax_logo.png] Jonathan Lacefield Solution Architect | (404) 822 3487 | jlacefi...@datastax.com [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax https://github.com/datastax/ On Tue, Dec 16, 2014 at 2:04 PM, Arne Claassen a...@emotient.com wrote: I have a three node cluster that has been sitting at a load of 4 (for each node), 100% CPI utilization (although 92% nice) for that last 12 hours, ever since some significant writes finished. I'm trying to determine what tuning I should be doing to get it out of this state. The debug log is just an endless series of: DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is 8000634880 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java (line 118) GC for ParNew: 165 ms for 10 collections, 4440011176 used; max is 8000634880 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 GCInspector.java (line 118) GC for ParNew: 135 ms for 8 collections, 4402220568 used; max is 8000634880 iostat shows virtually no I/O. Compaction may enter into this, but i don't really know what to make of compaction stats since they never change: [root@cassandra-37919c3a ~]# nodetool compactionstats pending tasks: 10 compaction typekeyspace table completed total unit progress Compaction mediamedia_tracks_raw 271651482 563615497 bytes48.20% Compaction mediamedia_tracks_raw 30308910 21676695677 bytes 0.14% Compaction
Re: 100% CPU utilization, ParNew and never completing compactions
So heap of that size without some tuning will create a number of problems (high cpu usage one of them), I suggest either 8GB heap and 400mb parnew (which I'd only set that low for that low cpu count) , or attempt the tunings as indicated in https://issues.apache.org/jira/browse/CASSANDRA-8150 On Tue, Dec 16, 2014 at 3:06 PM, Arne Claassen a...@emotient.com wrote: Changed the 15GB node to 25GB heap and the nice CPU is down to ~20% now. Checked my dev cluster to see if the ParNew log entries are just par for the course, but not seeing them there. However, both have the following every 30 seconds: DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,898 BatchlogManager.java (line 165) Started replayAllFailedBatches DEBUG [MemtablePostFlusher:1] 2014-12-16 21:00:44,899 ColumnFamilyStore.java (line 866) forceFlush requested but everything is clean in batchlog DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,899 BatchlogManager.java (line 200) Finished replayAllFailedBatches Is that just routine scheduled house-keeping or a sign of something else? On Tue, Dec 16, 2014 at 12:52 PM, Arne Claassen a...@emotient.com wrote: Sorry, I meant 15GB heap on the one machine that has less nice CPU% now. The others are 6GB On Tue, Dec 16, 2014 at 12:50 PM, Arne Claassen a...@emotient.com wrote: AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because we might go c3.2xlarge instead if CPU is more important than RAM Storage is optimized EBS SSD (but iostat shows no real IO going on) Each node only has about 10GB with ownership of 67%, 64.7% 68.3%. The node on which I set the Heap to 10GB from 6GB the utlilization has dropped to 46%nice now, but the ParNew log messages still continue at the same pace. I'm gonna up the HEAP to 20GB for a bit, see if that brings that nice CPU further down. No TombstoneOverflowingExceptions. On Tue, Dec 16, 2014 at 11:50 AM, Ryan Svihla rsvi...@datastax.com wrote: What's CPU, RAM, Storage layer, and data density per node? Exact heap settings would be nice. In the logs look for TombstoneOverflowingException On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen a...@emotient.com wrote: I'm running 2.0.10. The data is all time series data and as we change our pipeline, we've been periodically been reprocessing the data sources, which causes each time series to be overwritten, i.e. every row per partition key is deleted and re-written, so I assume i've been collecting a bunch of tombstones. Also, the presence of the ever present and never completing compaction types, i assumed were an artifact of tombstoning, but i fully admit to conjecture based on about ~20 blog posts and stackoverflow questions i've surveyed. I doubled the Heap on one node and it changed nothing regarding the load or the ParNew log statements. New Generation Usage is 50%, Eden itself is 56%. Anything else i should look at and report, let me know. On Tue, Dec 16, 2014 at 11:14 AM, Jonathan Lacefield jlacefi...@datastax.com wrote: Hello, What version of Cassandra are you running? If it's 2.0, we recently experienced something similar with 8447 [1], which 8485 [2] should hopefully resolve. Please note that 8447 is not related to tombstones. Tombstone processing can put a lot of pressure on the heap as well. Why do you think you have a lot of tombstones in that one particular table? [1] https://issues.apache.org/jira/browse/CASSANDRA-8447 [2] https://issues.apache.org/jira/browse/CASSANDRA-8485 Jonathan [image: datastax_logo.png] Jonathan Lacefield Solution Architect | (404) 822 3487 | jlacefi...@datastax.com [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax https://github.com/datastax/ On Tue, Dec 16, 2014 at 2:04 PM, Arne Claassen a...@emotient.com wrote: I have a three node cluster that has been sitting at a load of 4 (for each node), 100% CPI utilization (although 92% nice) for that last 12 hours, ever since some significant writes finished. I'm trying to determine what tuning I should be doing to get it out of this state. The debug log is just an endless series of: DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is 8000634880 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java (line 118) GC for ParNew: 165 ms for 10 collections, 4440011176 used; max is 8000634880 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 GCInspector.java (line 118) GC for ParNew: 135 ms for 8 collections, 4402220568 used; max is 8000634880 iostat shows virtually no I/O. Compaction may enter into this, but i don't really know what to make of compaction stats since they never change: [root@cassandra-37919c3a ~]#
Re: 100% CPU utilization, ParNew and never completing compactions
also based on replayed batches..are you using batches to load data? On Tue, Dec 16, 2014 at 3:12 PM, Ryan Svihla rsvi...@datastax.com wrote: So heap of that size without some tuning will create a number of problems (high cpu usage one of them), I suggest either 8GB heap and 400mb parnew (which I'd only set that low for that low cpu count) , or attempt the tunings as indicated in https://issues.apache.org/jira/browse/CASSANDRA-8150 On Tue, Dec 16, 2014 at 3:06 PM, Arne Claassen a...@emotient.com wrote: Changed the 15GB node to 25GB heap and the nice CPU is down to ~20% now. Checked my dev cluster to see if the ParNew log entries are just par for the course, but not seeing them there. However, both have the following every 30 seconds: DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,898 BatchlogManager.java (line 165) Started replayAllFailedBatches DEBUG [MemtablePostFlusher:1] 2014-12-16 21:00:44,899 ColumnFamilyStore.java (line 866) forceFlush requested but everything is clean in batchlog DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,899 BatchlogManager.java (line 200) Finished replayAllFailedBatches Is that just routine scheduled house-keeping or a sign of something else? On Tue, Dec 16, 2014 at 12:52 PM, Arne Claassen a...@emotient.com wrote: Sorry, I meant 15GB heap on the one machine that has less nice CPU% now. The others are 6GB On Tue, Dec 16, 2014 at 12:50 PM, Arne Claassen a...@emotient.com wrote: AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because we might go c3.2xlarge instead if CPU is more important than RAM Storage is optimized EBS SSD (but iostat shows no real IO going on) Each node only has about 10GB with ownership of 67%, 64.7% 68.3%. The node on which I set the Heap to 10GB from 6GB the utlilization has dropped to 46%nice now, but the ParNew log messages still continue at the same pace. I'm gonna up the HEAP to 20GB for a bit, see if that brings that nice CPU further down. No TombstoneOverflowingExceptions. On Tue, Dec 16, 2014 at 11:50 AM, Ryan Svihla rsvi...@datastax.com wrote: What's CPU, RAM, Storage layer, and data density per node? Exact heap settings would be nice. In the logs look for TombstoneOverflowingException On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen a...@emotient.com wrote: I'm running 2.0.10. The data is all time series data and as we change our pipeline, we've been periodically been reprocessing the data sources, which causes each time series to be overwritten, i.e. every row per partition key is deleted and re-written, so I assume i've been collecting a bunch of tombstones. Also, the presence of the ever present and never completing compaction types, i assumed were an artifact of tombstoning, but i fully admit to conjecture based on about ~20 blog posts and stackoverflow questions i've surveyed. I doubled the Heap on one node and it changed nothing regarding the load or the ParNew log statements. New Generation Usage is 50%, Eden itself is 56%. Anything else i should look at and report, let me know. On Tue, Dec 16, 2014 at 11:14 AM, Jonathan Lacefield jlacefi...@datastax.com wrote: Hello, What version of Cassandra are you running? If it's 2.0, we recently experienced something similar with 8447 [1], which 8485 [2] should hopefully resolve. Please note that 8447 is not related to tombstones. Tombstone processing can put a lot of pressure on the heap as well. Why do you think you have a lot of tombstones in that one particular table? [1] https://issues.apache.org/jira/browse/CASSANDRA-8447 [2] https://issues.apache.org/jira/browse/CASSANDRA-8485 Jonathan [image: datastax_logo.png] Jonathan Lacefield Solution Architect | (404) 822 3487 | jlacefi...@datastax.com [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax https://github.com/datastax/ On Tue, Dec 16, 2014 at 2:04 PM, Arne Claassen a...@emotient.com wrote: I have a three node cluster that has been sitting at a load of 4 (for each node), 100% CPI utilization (although 92% nice) for that last 12 hours, ever since some significant writes finished. I'm trying to determine what tuning I should be doing to get it out of this state. The debug log is just an endless series of: DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is 8000634880 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java (line 118) GC for ParNew: 165 ms for 10 collections, 4440011176 used; max is 8000634880 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 GCInspector.java (line 118) GC for ParNew: 135 ms for 8 collections, 4402220568 used; max is 8000634880 iostat shows virtually no
Best Time Series insert strategy
I have a time series table consisting of frame information for media. The table is partitioned on the media ID and uses time and some other frame level keys as cluster keys, i.e. all frames for a one piece of media is really one column family row, even though it is represented in CQL as a ordered series of frame data. The size of these sets vary from 5k to 200k rows per media and are always inserted at one time and available in memory in ordered form. I'm currently fanning the inserts out via async calls, using a queue to fix the max parallelism (set to 100 right now). For some of the larger sets (50k and above) I sometimes get the following exception: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write) at com.datastax.driver.core.exceptions.WriteTimeoutException.copy(WriteTimeoutException.java:54) ~[com.datastax.cassandra.cassandra-driver-core-2.1.1.jar:na] at com.datastax.driver.core.Responses$Error.asException(Responses.java:93) ~[com.datastax.cassandra.cassandra-driver-core-2.1.1.jar:na] at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:110) ~[com.datastax.cassandra.cassandra-driver-core-2.1.1.jar:na] at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:237) ~[com.datastax.cassandra.cassandra-driver-core-2.1.1.jar:na] at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:402) ~[com.datastax.cassandra.cassandra-driver-core-2.1.1.jar:na] I've tried to reduce the max parallelism and increasing the timeout threshold, but once the cluster gets humming from a bunch of inserts even going as low as 10 in parallel doesn't seem to completely avoid those exceptions from occurring. I realize that fanning out just means that previously ordered data is not arriving at random nodes in random order and has to get to the partition key owning nodes and be re-ordered as they arrive, which seems less like the wrong way to do it. However the parallelism approach does increase insert speed almost linearly except for those timeouts. I'm wondering what the best approach would be. The scenarios I can think of are: 1) Retry and back off on Timeout Exceptions, but keep the fan out approach. Seems like a good approach unless the Timeout really is just a warning that I'm overloading things 2) Switch to BATCH inserts Would this be better, since the data would go to only a single node and be inserted in ordered form? And would this even alleviate timeouts since now giant batches need to be acknowledged by the replicas. 3) Go to consistency ANY. The docs seem to imply that TimeoutException isn't really a failure, just a heads up. I don't really care about waiting for all replicas to be up to date on these inserts anyhow, but is it really safe or am i looking at replica's drifting out of sync. 4) Figure out how to tune my cluster better and change nothing on the client thanks, arne
Re: 100% CPU utilization, ParNew and never completing compactions
The starting configuration I had, which is still running on two of the nodes, was 6GB Heap, 1024MB parnew which is close to what you are suggesting and those have been pegged at load 4 for the over 12 hours with hardly and read or write traffic. I will set one to 8GB/400MB and see if its load changes. On Tue, Dec 16, 2014 at 1:12 PM, Ryan Svihla rsvi...@datastax.com wrote: So heap of that size without some tuning will create a number of problems (high cpu usage one of them), I suggest either 8GB heap and 400mb parnew (which I'd only set that low for that low cpu count) , or attempt the tunings as indicated in https://issues.apache.org/jira/browse/CASSANDRA-8150 On Tue, Dec 16, 2014 at 3:06 PM, Arne Claassen a...@emotient.com wrote: Changed the 15GB node to 25GB heap and the nice CPU is down to ~20% now. Checked my dev cluster to see if the ParNew log entries are just par for the course, but not seeing them there. However, both have the following every 30 seconds: DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,898 BatchlogManager.java (line 165) Started replayAllFailedBatches DEBUG [MemtablePostFlusher:1] 2014-12-16 21:00:44,899 ColumnFamilyStore.java (line 866) forceFlush requested but everything is clean in batchlog DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,899 BatchlogManager.java (line 200) Finished replayAllFailedBatches Is that just routine scheduled house-keeping or a sign of something else? On Tue, Dec 16, 2014 at 12:52 PM, Arne Claassen a...@emotient.com wrote: Sorry, I meant 15GB heap on the one machine that has less nice CPU% now. The others are 6GB On Tue, Dec 16, 2014 at 12:50 PM, Arne Claassen a...@emotient.com wrote: AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because we might go c3.2xlarge instead if CPU is more important than RAM Storage is optimized EBS SSD (but iostat shows no real IO going on) Each node only has about 10GB with ownership of 67%, 64.7% 68.3%. The node on which I set the Heap to 10GB from 6GB the utlilization has dropped to 46%nice now, but the ParNew log messages still continue at the same pace. I'm gonna up the HEAP to 20GB for a bit, see if that brings that nice CPU further down. No TombstoneOverflowingExceptions. On Tue, Dec 16, 2014 at 11:50 AM, Ryan Svihla rsvi...@datastax.com wrote: What's CPU, RAM, Storage layer, and data density per node? Exact heap settings would be nice. In the logs look for TombstoneOverflowingException On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen a...@emotient.com wrote: I'm running 2.0.10. The data is all time series data and as we change our pipeline, we've been periodically been reprocessing the data sources, which causes each time series to be overwritten, i.e. every row per partition key is deleted and re-written, so I assume i've been collecting a bunch of tombstones. Also, the presence of the ever present and never completing compaction types, i assumed were an artifact of tombstoning, but i fully admit to conjecture based on about ~20 blog posts and stackoverflow questions i've surveyed. I doubled the Heap on one node and it changed nothing regarding the load or the ParNew log statements. New Generation Usage is 50%, Eden itself is 56%. Anything else i should look at and report, let me know. On Tue, Dec 16, 2014 at 11:14 AM, Jonathan Lacefield jlacefi...@datastax.com wrote: Hello, What version of Cassandra are you running? If it's 2.0, we recently experienced something similar with 8447 [1], which 8485 [2] should hopefully resolve. Please note that 8447 is not related to tombstones. Tombstone processing can put a lot of pressure on the heap as well. Why do you think you have a lot of tombstones in that one particular table? [1] https://issues.apache.org/jira/browse/CASSANDRA-8447 [2] https://issues.apache.org/jira/browse/CASSANDRA-8485 Jonathan [image: datastax_logo.png] Jonathan Lacefield Solution Architect | (404) 822 3487 | jlacefi...@datastax.com [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax https://github.com/datastax/ On Tue, Dec 16, 2014 at 2:04 PM, Arne Claassen a...@emotient.com wrote: I have a three node cluster that has been sitting at a load of 4 (for each node), 100% CPI utilization (although 92% nice) for that last 12 hours, ever since some significant writes finished. I'm trying to determine what tuning I should be doing to get it out of this state. The debug log is just an endless series of: DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is 8000634880 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java (line 118) GC for ParNew: 165 ms for 10
Re: 100% CPU utilization, ParNew and never completing compactions
So 1024 is still a good 2.5 times what I'm suggesting, 6GB is hardly enough to run Cassandra well in, especially if you're going full bore on loads. However, you maybe just flat out be CPU bound on your write throughput, how many TPS and what size writes do you have? Also what is your widest row? Final question what is compaction throughput at? On Tue, Dec 16, 2014 at 3:20 PM, Arne Claassen a...@emotient.com wrote: The starting configuration I had, which is still running on two of the nodes, was 6GB Heap, 1024MB parnew which is close to what you are suggesting and those have been pegged at load 4 for the over 12 hours with hardly and read or write traffic. I will set one to 8GB/400MB and see if its load changes. On Tue, Dec 16, 2014 at 1:12 PM, Ryan Svihla rsvi...@datastax.com wrote: So heap of that size without some tuning will create a number of problems (high cpu usage one of them), I suggest either 8GB heap and 400mb parnew (which I'd only set that low for that low cpu count) , or attempt the tunings as indicated in https://issues.apache.org/jira/browse/CASSANDRA-8150 On Tue, Dec 16, 2014 at 3:06 PM, Arne Claassen a...@emotient.com wrote: Changed the 15GB node to 25GB heap and the nice CPU is down to ~20% now. Checked my dev cluster to see if the ParNew log entries are just par for the course, but not seeing them there. However, both have the following every 30 seconds: DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,898 BatchlogManager.java (line 165) Started replayAllFailedBatches DEBUG [MemtablePostFlusher:1] 2014-12-16 21:00:44,899 ColumnFamilyStore.java (line 866) forceFlush requested but everything is clean in batchlog DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,899 BatchlogManager.java (line 200) Finished replayAllFailedBatches Is that just routine scheduled house-keeping or a sign of something else? On Tue, Dec 16, 2014 at 12:52 PM, Arne Claassen a...@emotient.com wrote: Sorry, I meant 15GB heap on the one machine that has less nice CPU% now. The others are 6GB On Tue, Dec 16, 2014 at 12:50 PM, Arne Claassen a...@emotient.com wrote: AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because we might go c3.2xlarge instead if CPU is more important than RAM Storage is optimized EBS SSD (but iostat shows no real IO going on) Each node only has about 10GB with ownership of 67%, 64.7% 68.3%. The node on which I set the Heap to 10GB from 6GB the utlilization has dropped to 46%nice now, but the ParNew log messages still continue at the same pace. I'm gonna up the HEAP to 20GB for a bit, see if that brings that nice CPU further down. No TombstoneOverflowingExceptions. On Tue, Dec 16, 2014 at 11:50 AM, Ryan Svihla rsvi...@datastax.com wrote: What's CPU, RAM, Storage layer, and data density per node? Exact heap settings would be nice. In the logs look for TombstoneOverflowingException On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen a...@emotient.com wrote: I'm running 2.0.10. The data is all time series data and as we change our pipeline, we've been periodically been reprocessing the data sources, which causes each time series to be overwritten, i.e. every row per partition key is deleted and re-written, so I assume i've been collecting a bunch of tombstones. Also, the presence of the ever present and never completing compaction types, i assumed were an artifact of tombstoning, but i fully admit to conjecture based on about ~20 blog posts and stackoverflow questions i've surveyed. I doubled the Heap on one node and it changed nothing regarding the load or the ParNew log statements. New Generation Usage is 50%, Eden itself is 56%. Anything else i should look at and report, let me know. On Tue, Dec 16, 2014 at 11:14 AM, Jonathan Lacefield jlacefi...@datastax.com wrote: Hello, What version of Cassandra are you running? If it's 2.0, we recently experienced something similar with 8447 [1], which 8485 [2] should hopefully resolve. Please note that 8447 is not related to tombstones. Tombstone processing can put a lot of pressure on the heap as well. Why do you think you have a lot of tombstones in that one particular table? [1] https://issues.apache.org/jira/browse/CASSANDRA-8447 [2] https://issues.apache.org/jira/browse/CASSANDRA-8485 Jonathan [image: datastax_logo.png] Jonathan Lacefield Solution Architect | (404) 822 3487 | jlacefi...@datastax.com [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax https://github.com/datastax/ On Tue, Dec 16, 2014 at 2:04 PM, Arne Claassen a...@emotient.com wrote: I have a three node cluster that has been sitting at a load of 4 (for each node), 100% CPI utilization (although 92% nice) for that last 12 hours, ever since some
Re: 100% CPU utilization, ParNew and never completing compactions
Actually not sure why the machine was originally configured at 6GB since we even started it on an r3.large with 15GB. Re: Batches Not using batches. I actually have that as a separate question on the list. Currently I fan out async single inserts and I'm wondering if batches are better since my data is inherently inserted in blocks of ordered rows for a single partition key. Re: Traffic There isn't all that much traffic. Inserts come in as blocks per partition key, but then can be 5k-200k rows for that partition key. Each of these rows is less than 100k. It's small, lots of ordered rows. It's frame and sub-frame information for media. and rows for one piece of media is inserted at once (the partition key). For the last 12 hours, where the load on all these machine has been stuck there's been virtually no traffic at all. This is the nodes basically sitting idle, except that they had load of 4 each. BTW, how do you determine widest row or for that matter number of tombstones in a row? thanks, arne On Tue, Dec 16, 2014 at 1:24 PM, Ryan Svihla rsvi...@datastax.com wrote: So 1024 is still a good 2.5 times what I'm suggesting, 6GB is hardly enough to run Cassandra well in, especially if you're going full bore on loads. However, you maybe just flat out be CPU bound on your write throughput, how many TPS and what size writes do you have? Also what is your widest row? Final question what is compaction throughput at? On Tue, Dec 16, 2014 at 3:20 PM, Arne Claassen a...@emotient.com wrote: The starting configuration I had, which is still running on two of the nodes, was 6GB Heap, 1024MB parnew which is close to what you are suggesting and those have been pegged at load 4 for the over 12 hours with hardly and read or write traffic. I will set one to 8GB/400MB and see if its load changes. On Tue, Dec 16, 2014 at 1:12 PM, Ryan Svihla rsvi...@datastax.com wrote: So heap of that size without some tuning will create a number of problems (high cpu usage one of them), I suggest either 8GB heap and 400mb parnew (which I'd only set that low for that low cpu count) , or attempt the tunings as indicated in https://issues.apache.org/jira/browse/CASSANDRA-8150 On Tue, Dec 16, 2014 at 3:06 PM, Arne Claassen a...@emotient.com wrote: Changed the 15GB node to 25GB heap and the nice CPU is down to ~20% now. Checked my dev cluster to see if the ParNew log entries are just par for the course, but not seeing them there. However, both have the following every 30 seconds: DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,898 BatchlogManager.java (line 165) Started replayAllFailedBatches DEBUG [MemtablePostFlusher:1] 2014-12-16 21:00:44,899 ColumnFamilyStore.java (line 866) forceFlush requested but everything is clean in batchlog DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,899 BatchlogManager.java (line 200) Finished replayAllFailedBatches Is that just routine scheduled house-keeping or a sign of something else? On Tue, Dec 16, 2014 at 12:52 PM, Arne Claassen a...@emotient.com wrote: Sorry, I meant 15GB heap on the one machine that has less nice CPU% now. The others are 6GB On Tue, Dec 16, 2014 at 12:50 PM, Arne Claassen a...@emotient.com wrote: AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because we might go c3.2xlarge instead if CPU is more important than RAM Storage is optimized EBS SSD (but iostat shows no real IO going on) Each node only has about 10GB with ownership of 67%, 64.7% 68.3%. The node on which I set the Heap to 10GB from 6GB the utlilization has dropped to 46%nice now, but the ParNew log messages still continue at the same pace. I'm gonna up the HEAP to 20GB for a bit, see if that brings that nice CPU further down. No TombstoneOverflowingExceptions. On Tue, Dec 16, 2014 at 11:50 AM, Ryan Svihla rsvi...@datastax.com wrote: What's CPU, RAM, Storage layer, and data density per node? Exact heap settings would be nice. In the logs look for TombstoneOverflowingException On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen a...@emotient.com wrote: I'm running 2.0.10. The data is all time series data and as we change our pipeline, we've been periodically been reprocessing the data sources, which causes each time series to be overwritten, i.e. every row per partition key is deleted and re-written, so I assume i've been collecting a bunch of tombstones. Also, the presence of the ever present and never completing compaction types, i assumed were an artifact of tombstoning, but i fully admit to conjecture based on about ~20 blog posts and stackoverflow questions i've surveyed. I doubled the Heap on one node and it changed nothing regarding the load or the ParNew log statements. New Generation Usage is 50%, Eden itself is 56%. Anything else i should look at and report, let me know. On Tue, Dec 16, 2014 at 11:14 AM, Jonathan Lacefield jlacefi...@datastax.com wrote: Hello, What version of Cassandra are you
Re: 100% CPU utilization, ParNew and never completing compactions
Can you define what is virtual no traffic sorry to be repetitive about that, but I've worked on a lot of clusters in the past year and people have wildly different ideas what that means. unlogged batches of the same partition key are definitely a performance optimization. Typically async is much faster and easier on the cluster when you're using multip partition key batches. nodetool cfhistograms keyspace tablename On Tue, Dec 16, 2014 at 3:42 PM, Arne Claassen a...@emotient.com wrote: Actually not sure why the machine was originally configured at 6GB since we even started it on an r3.large with 15GB. Re: Batches Not using batches. I actually have that as a separate question on the list. Currently I fan out async single inserts and I'm wondering if batches are better since my data is inherently inserted in blocks of ordered rows for a single partition key. Re: Traffic There isn't all that much traffic. Inserts come in as blocks per partition key, but then can be 5k-200k rows for that partition key. Each of these rows is less than 100k. It's small, lots of ordered rows. It's frame and sub-frame information for media. and rows for one piece of media is inserted at once (the partition key). For the last 12 hours, where the load on all these machine has been stuck there's been virtually no traffic at all. This is the nodes basically sitting idle, except that they had load of 4 each. BTW, how do you determine widest row or for that matter number of tombstones in a row? thanks, arne On Tue, Dec 16, 2014 at 1:24 PM, Ryan Svihla rsvi...@datastax.com wrote: So 1024 is still a good 2.5 times what I'm suggesting, 6GB is hardly enough to run Cassandra well in, especially if you're going full bore on loads. However, you maybe just flat out be CPU bound on your write throughput, how many TPS and what size writes do you have? Also what is your widest row? Final question what is compaction throughput at? On Tue, Dec 16, 2014 at 3:20 PM, Arne Claassen a...@emotient.com wrote: The starting configuration I had, which is still running on two of the nodes, was 6GB Heap, 1024MB parnew which is close to what you are suggesting and those have been pegged at load 4 for the over 12 hours with hardly and read or write traffic. I will set one to 8GB/400MB and see if its load changes. On Tue, Dec 16, 2014 at 1:12 PM, Ryan Svihla rsvi...@datastax.com wrote: So heap of that size without some tuning will create a number of problems (high cpu usage one of them), I suggest either 8GB heap and 400mb parnew (which I'd only set that low for that low cpu count) , or attempt the tunings as indicated in https://issues.apache.org/jira/browse/CASSANDRA-8150 On Tue, Dec 16, 2014 at 3:06 PM, Arne Claassen a...@emotient.com wrote: Changed the 15GB node to 25GB heap and the nice CPU is down to ~20% now. Checked my dev cluster to see if the ParNew log entries are just par for the course, but not seeing them there. However, both have the following every 30 seconds: DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,898 BatchlogManager.java (line 165) Started replayAllFailedBatches DEBUG [MemtablePostFlusher:1] 2014-12-16 21:00:44,899 ColumnFamilyStore.java (line 866) forceFlush requested but everything is clean in batchlog DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,899 BatchlogManager.java (line 200) Finished replayAllFailedBatches Is that just routine scheduled house-keeping or a sign of something else? On Tue, Dec 16, 2014 at 12:52 PM, Arne Claassen a...@emotient.com wrote: Sorry, I meant 15GB heap on the one machine that has less nice CPU% now. The others are 6GB On Tue, Dec 16, 2014 at 12:50 PM, Arne Claassen a...@emotient.com wrote: AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because we might go c3.2xlarge instead if CPU is more important than RAM Storage is optimized EBS SSD (but iostat shows no real IO going on) Each node only has about 10GB with ownership of 67%, 64.7% 68.3%. The node on which I set the Heap to 10GB from 6GB the utlilization has dropped to 46%nice now, but the ParNew log messages still continue at the same pace. I'm gonna up the HEAP to 20GB for a bit, see if that brings that nice CPU further down. No TombstoneOverflowingExceptions. On Tue, Dec 16, 2014 at 11:50 AM, Ryan Svihla rsvi...@datastax.com wrote: What's CPU, RAM, Storage layer, and data density per node? Exact heap settings would be nice. In the logs look for TombstoneOverflowingException On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen a...@emotient.com wrote: I'm running 2.0.10. The data is all time series data and as we change our pipeline, we've been periodically been reprocessing the data sources, which causes each time series to be overwritten, i.e. every row per partition key is deleted and re-written, so I assume i've been collecting a bunch of tombstones. Also, the presence of the ever present and never completing
Re: 100% CPU utilization, ParNew and never completing compactions
No problem with the follow up questions. I'm on a crash course here trying to understand what makes C* tick so I appreciate all feedback. We reprocessed all media (1200 partition keys) last night where partition keys had somewhere between 4k and 200k rows. After that completed, no traffic went to cluster at all for ~8 hours and throughout today, we may get a couple (less than 10) queries per second and maybe 3-4 write batches per hour. I assume the last value in the Partition Size histogram is the largest row: 20924300 bytes: 79 25109160 bytes: 57 The majority seems clustered around 20 bytes. I will look at switching my inserts to unlogged batches since they are always for one partition key. On Tue, Dec 16, 2014 at 1:47 PM, Ryan Svihla rsvi...@datastax.com wrote: Can you define what is virtual no traffic sorry to be repetitive about that, but I've worked on a lot of clusters in the past year and people have wildly different ideas what that means. unlogged batches of the same partition key are definitely a performance optimization. Typically async is much faster and easier on the cluster when you're using multip partition key batches. nodetool cfhistograms keyspace tablename On Tue, Dec 16, 2014 at 3:42 PM, Arne Claassen a...@emotient.com wrote: Actually not sure why the machine was originally configured at 6GB since we even started it on an r3.large with 15GB. Re: Batches Not using batches. I actually have that as a separate question on the list. Currently I fan out async single inserts and I'm wondering if batches are better since my data is inherently inserted in blocks of ordered rows for a single partition key. Re: Traffic There isn't all that much traffic. Inserts come in as blocks per partition key, but then can be 5k-200k rows for that partition key. Each of these rows is less than 100k. It's small, lots of ordered rows. It's frame and sub-frame information for media. and rows for one piece of media is inserted at once (the partition key). For the last 12 hours, where the load on all these machine has been stuck there's been virtually no traffic at all. This is the nodes basically sitting idle, except that they had load of 4 each. BTW, how do you determine widest row or for that matter number of tombstones in a row? thanks, arne On Tue, Dec 16, 2014 at 1:24 PM, Ryan Svihla rsvi...@datastax.com wrote: So 1024 is still a good 2.5 times what I'm suggesting, 6GB is hardly enough to run Cassandra well in, especially if you're going full bore on loads. However, you maybe just flat out be CPU bound on your write throughput, how many TPS and what size writes do you have? Also what is your widest row? Final question what is compaction throughput at? On Tue, Dec 16, 2014 at 3:20 PM, Arne Claassen a...@emotient.com wrote: The starting configuration I had, which is still running on two of the nodes, was 6GB Heap, 1024MB parnew which is close to what you are suggesting and those have been pegged at load 4 for the over 12 hours with hardly and read or write traffic. I will set one to 8GB/400MB and see if its load changes. On Tue, Dec 16, 2014 at 1:12 PM, Ryan Svihla rsvi...@datastax.com wrote: So heap of that size without some tuning will create a number of problems (high cpu usage one of them), I suggest either 8GB heap and 400mb parnew (which I'd only set that low for that low cpu count) , or attempt the tunings as indicated in https://issues.apache.org/jira/browse/CASSANDRA-8150 On Tue, Dec 16, 2014 at 3:06 PM, Arne Claassen a...@emotient.com wrote: Changed the 15GB node to 25GB heap and the nice CPU is down to ~20% now. Checked my dev cluster to see if the ParNew log entries are just par for the course, but not seeing them there. However, both have the following every 30 seconds: DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,898 BatchlogManager.java (line 165) Started replayAllFailedBatches DEBUG [MemtablePostFlusher:1] 2014-12-16 21:00:44,899 ColumnFamilyStore.java (line 866) forceFlush requested but everything is clean in batchlog DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,899 BatchlogManager.java (line 200) Finished replayAllFailedBatches Is that just routine scheduled house-keeping or a sign of something else? On Tue, Dec 16, 2014 at 12:52 PM, Arne Claassen a...@emotient.com wrote: Sorry, I meant 15GB heap on the one machine that has less nice CPU% now. The others are 6GB On Tue, Dec 16, 2014 at 12:50 PM, Arne Claassen a...@emotient.com wrote: AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because we might go c3.2xlarge instead if CPU is more important than RAM Storage is optimized EBS SSD (but iostat shows no real IO going on) Each node only has about 10GB with ownership of 67%, 64.7% 68.3%. The node on which I set the Heap to 10GB from 6GB the utlilization has dropped to 46%nice now, but the ParNew log messages still continue at the same pace. I'm gonna up the HEAP to
Re: 100% CPU utilization, ParNew and never completing compactions
Ok based on those numbers I have a theory.. can you show me nodetool tptats for all 3 nodes? On Tue, Dec 16, 2014 at 4:04 PM, Arne Claassen a...@emotient.com wrote: No problem with the follow up questions. I'm on a crash course here trying to understand what makes C* tick so I appreciate all feedback. We reprocessed all media (1200 partition keys) last night where partition keys had somewhere between 4k and 200k rows. After that completed, no traffic went to cluster at all for ~8 hours and throughout today, we may get a couple (less than 10) queries per second and maybe 3-4 write batches per hour. I assume the last value in the Partition Size histogram is the largest row: 20924300 bytes: 79 25109160 bytes: 57 The majority seems clustered around 20 bytes. I will look at switching my inserts to unlogged batches since they are always for one partition key. On Tue, Dec 16, 2014 at 1:47 PM, Ryan Svihla rsvi...@datastax.com wrote: Can you define what is virtual no traffic sorry to be repetitive about that, but I've worked on a lot of clusters in the past year and people have wildly different ideas what that means. unlogged batches of the same partition key are definitely a performance optimization. Typically async is much faster and easier on the cluster when you're using multip partition key batches. nodetool cfhistograms keyspace tablename On Tue, Dec 16, 2014 at 3:42 PM, Arne Claassen a...@emotient.com wrote: Actually not sure why the machine was originally configured at 6GB since we even started it on an r3.large with 15GB. Re: Batches Not using batches. I actually have that as a separate question on the list. Currently I fan out async single inserts and I'm wondering if batches are better since my data is inherently inserted in blocks of ordered rows for a single partition key. Re: Traffic There isn't all that much traffic. Inserts come in as blocks per partition key, but then can be 5k-200k rows for that partition key. Each of these rows is less than 100k. It's small, lots of ordered rows. It's frame and sub-frame information for media. and rows for one piece of media is inserted at once (the partition key). For the last 12 hours, where the load on all these machine has been stuck there's been virtually no traffic at all. This is the nodes basically sitting idle, except that they had load of 4 each. BTW, how do you determine widest row or for that matter number of tombstones in a row? thanks, arne On Tue, Dec 16, 2014 at 1:24 PM, Ryan Svihla rsvi...@datastax.com wrote: So 1024 is still a good 2.5 times what I'm suggesting, 6GB is hardly enough to run Cassandra well in, especially if you're going full bore on loads. However, you maybe just flat out be CPU bound on your write throughput, how many TPS and what size writes do you have? Also what is your widest row? Final question what is compaction throughput at? On Tue, Dec 16, 2014 at 3:20 PM, Arne Claassen a...@emotient.com wrote: The starting configuration I had, which is still running on two of the nodes, was 6GB Heap, 1024MB parnew which is close to what you are suggesting and those have been pegged at load 4 for the over 12 hours with hardly and read or write traffic. I will set one to 8GB/400MB and see if its load changes. On Tue, Dec 16, 2014 at 1:12 PM, Ryan Svihla rsvi...@datastax.com wrote: So heap of that size without some tuning will create a number of problems (high cpu usage one of them), I suggest either 8GB heap and 400mb parnew (which I'd only set that low for that low cpu count) , or attempt the tunings as indicated in https://issues.apache.org/jira/browse/CASSANDRA-8150 On Tue, Dec 16, 2014 at 3:06 PM, Arne Claassen a...@emotient.com wrote: Changed the 15GB node to 25GB heap and the nice CPU is down to ~20% now. Checked my dev cluster to see if the ParNew log entries are just par for the course, but not seeing them there. However, both have the following every 30 seconds: DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,898 BatchlogManager.java (line 165) Started replayAllFailedBatches DEBUG [MemtablePostFlusher:1] 2014-12-16 21:00:44,899 ColumnFamilyStore.java (line 866) forceFlush requested but everything is clean in batchlog DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,899 BatchlogManager.java (line 200) Finished replayAllFailedBatches Is that just routine scheduled house-keeping or a sign of something else? On Tue, Dec 16, 2014 at 12:52 PM, Arne Claassen a...@emotient.com wrote: Sorry, I meant 15GB heap on the one machine that has less nice CPU% now. The others are 6GB On Tue, Dec 16, 2014 at 12:50 PM, Arne Claassen a...@emotient.com wrote: AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because we might go c3.2xlarge instead if CPU is more important than RAM Storage is optimized EBS SSD (but iostat shows no real IO going on) Each node only has about 10GB with ownership of 67%, 64.7% 68.3%.
Re: 100% CPU utilization, ParNew and never completing compactions
Of course QA decided to start a test batch (still relatively low traffic), so I hope it doesn't throw the tpstats off too much Node 1: Pool NameActive Pending Completed Blocked All time blocked MutationStage 0 0 13804928 0 0 ReadStage 0 0 10975 0 0 RequestResponseStage 0 07725378 0 0 ReadRepairStage 0 0 1247 0 0 ReplicateOnWriteStage 0 0 0 0 0 MiscStage 0 0 0 0 0 HintedHandoff 1 1 50 0 0 FlushWriter 0 0306 0 31 MemoryMeter 0 0719 0 0 GossipStage 0 0 286505 0 0 CacheCleanupExecutor 0 0 0 0 0 InternalResponseStage 0 0 0 0 0 CompactionExecutor414159 0 0 ValidationExecutor0 0 0 0 0 MigrationStage0 0 0 0 0 commitlog_archiver0 0 0 0 0 AntiEntropyStage 0 0 0 0 0 PendingRangeCalculator0 0 11 0 0 MemtablePostFlusher 0 0 1781 0 0 Message type Dropped READ 0 RANGE_SLICE 0 _TRACE 0 MUTATION391041 COUNTER_MUTATION 0 BINARY 0 REQUEST_RESPONSE 0 PAGED_RANGE 0 READ_REPAIR 0 Node 2: Pool NameActive Pending Completed Blocked All time blocked MutationStage 0 0 997042 0 0 ReadStage 0 0 2623 0 0 RequestResponseStage 0 0 706650 0 0 ReadRepairStage 0 0275 0 0 ReplicateOnWriteStage 0 0 0 0 0 MiscStage 0 0 0 0 0 HintedHandoff 2 2 12 0 0 FlushWriter 0 0 37 0 4 MemoryMeter 0 0 70 0 0 GossipStage 0 0 14927 0 0 CacheCleanupExecutor 0 0 0 0 0 InternalResponseStage 0 0 0 0 0 CompactionExecutor4 7 94 0 0 ValidationExecutor0 0 0 0 0 MigrationStage0 0 0 0 0 commitlog_archiver0 0 0 0 0 AntiEntropyStage 0 0 0 0 0 PendingRangeCalculator0 0 3 0 0 MemtablePostFlusher 0 0114 0 0 Message type Dropped READ 0 RANGE_SLICE 0 _TRACE 0 MUTATION 0 COUNTER_MUTATION 0 BINARY 0 REQUEST_RESPONSE 0 PAGED_RANGE 0 READ_REPAIR 0 Node 3: Pool NameActive Pending Completed Blocked All time blocked MutationStage 0 01539324 0 0 ReadStage 0 0 2571 0 0 RequestResponseStage 0 0 373300 0 0 ReadRepairStage 0 0325 0 0 ReplicateOnWriteStage 0 0 0 0 0 MiscStage 0 0 0 0 0 HintedHandoff 1 1 21 0 0 FlushWriter 0 0 38 0 5 MemoryMeter 0 0
Re: 100% CPU utilization, ParNew and never completing compactions
so you've got some blocked flush writers but you have a incredibly large number of dropped mutations, are you using secondary indexes? and if so how many? what is your flush queue set to? On Tue, Dec 16, 2014 at 4:43 PM, Arne Claassen a...@emotient.com wrote: Of course QA decided to start a test batch (still relatively low traffic), so I hope it doesn't throw the tpstats off too much Node 1: Pool NameActive Pending Completed Blocked All time blocked MutationStage 0 0 13804928 0 0 ReadStage 0 0 10975 0 0 RequestResponseStage 0 07725378 0 0 ReadRepairStage 0 0 1247 0 0 ReplicateOnWriteStage 0 0 0 0 0 MiscStage 0 0 0 0 0 HintedHandoff 1 1 50 0 0 FlushWriter 0 0306 0 31 MemoryMeter 0 0719 0 0 GossipStage 0 0 286505 0 0 CacheCleanupExecutor 0 0 0 0 0 InternalResponseStage 0 0 0 0 0 CompactionExecutor414159 0 0 ValidationExecutor0 0 0 0 0 MigrationStage0 0 0 0 0 commitlog_archiver0 0 0 0 0 AntiEntropyStage 0 0 0 0 0 PendingRangeCalculator0 0 11 0 0 MemtablePostFlusher 0 0 1781 0 0 Message type Dropped READ 0 RANGE_SLICE 0 _TRACE 0 MUTATION391041 COUNTER_MUTATION 0 BINARY 0 REQUEST_RESPONSE 0 PAGED_RANGE 0 READ_REPAIR 0 Node 2: Pool NameActive Pending Completed Blocked All time blocked MutationStage 0 0 997042 0 0 ReadStage 0 0 2623 0 0 RequestResponseStage 0 0 706650 0 0 ReadRepairStage 0 0275 0 0 ReplicateOnWriteStage 0 0 0 0 0 MiscStage 0 0 0 0 0 HintedHandoff 2 2 12 0 0 FlushWriter 0 0 37 0 4 MemoryMeter 0 0 70 0 0 GossipStage 0 0 14927 0 0 CacheCleanupExecutor 0 0 0 0 0 InternalResponseStage 0 0 0 0 0 CompactionExecutor4 7 94 0 0 ValidationExecutor0 0 0 0 0 MigrationStage0 0 0 0 0 commitlog_archiver0 0 0 0 0 AntiEntropyStage 0 0 0 0 0 PendingRangeCalculator0 0 3 0 0 MemtablePostFlusher 0 0114 0 0 Message type Dropped READ 0 RANGE_SLICE 0 _TRACE 0 MUTATION 0 COUNTER_MUTATION 0 BINARY 0 REQUEST_RESPONSE 0 PAGED_RANGE 0 READ_REPAIR 0 Node 3: Pool NameActive Pending Completed Blocked All time blocked MutationStage 0 01539324 0 0 ReadStage 0 0 2571 0 0 RequestResponseStage 0 0 373300 0 0 ReadRepairStage 0 0325 0 0
Re: 100% CPU utilization, ParNew and never completing compactions
Not using any secondary indicies and memtable_flush_queue_size is the default 4. But let me tell you how data is mutated right now, maybe that will give you an insight on how this is happening Basically the frame data table has the following primary key: PRIMARY KEY ((id), trackid, timestamp) Generally data is inserted once. So day to day writes are all new rows. However, when out process for generating analytics for these rows changes, we run the media back through again, causing overwrites. Up until last night, this was just a new insert because the PK never changed so it was always 1-to-1 overwrite of every row. Last night was the first time that a new change went in where the PK could actually change so now the process is always, DELETE by partition key, insert all rows for partition key, repeat. We two tables that have similar frame data projections and some other aggregates with much smaller row count per partition key. hope that helps, arne On Dec 16, 2014, at 2:46 PM, Ryan Svihla rsvi...@datastax.com wrote: so you've got some blocked flush writers but you have a incredibly large number of dropped mutations, are you using secondary indexes? and if so how many? what is your flush queue set to? On Tue, Dec 16, 2014 at 4:43 PM, Arne Claassen a...@emotient.com wrote: Of course QA decided to start a test batch (still relatively low traffic), so I hope it doesn't throw the tpstats off too much Node 1: Pool NameActive Pending Completed Blocked All time blocked MutationStage 0 0 13804928 0 0 ReadStage 0 0 10975 0 0 RequestResponseStage 0 07725378 0 0 ReadRepairStage 0 0 1247 0 0 ReplicateOnWriteStage 0 0 0 0 0 MiscStage 0 0 0 0 0 HintedHandoff 1 1 50 0 0 FlushWriter 0 0306 0 31 MemoryMeter 0 0719 0 0 GossipStage 0 0 286505 0 0 CacheCleanupExecutor 0 0 0 0 0 InternalResponseStage 0 0 0 0 0 CompactionExecutor414159 0 0 ValidationExecutor0 0 0 0 0 MigrationStage0 0 0 0 0 commitlog_archiver0 0 0 0 0 AntiEntropyStage 0 0 0 0 0 PendingRangeCalculator0 0 11 0 0 MemtablePostFlusher 0 0 1781 0 0 Message type Dropped READ 0 RANGE_SLICE 0 _TRACE 0 MUTATION391041 COUNTER_MUTATION 0 BINARY 0 REQUEST_RESPONSE 0 PAGED_RANGE 0 READ_REPAIR 0 Node 2: Pool NameActive Pending Completed Blocked All time blocked MutationStage 0 0 997042 0 0 ReadStage 0 0 2623 0 0 RequestResponseStage 0 0 706650 0 0 ReadRepairStage 0 0275 0 0 ReplicateOnWriteStage 0 0 0 0 0 MiscStage 0 0 0 0 0 HintedHandoff 2 2 12 0 0 FlushWriter 0 0 37 0 4 MemoryMeter 0 0 70 0 0 GossipStage 0 0 14927 0 0 CacheCleanupExecutor 0 0 0 0 0 InternalResponseStage 0 0 0 0 0 CompactionExecutor4 7 94 0 0 ValidationExecutor0 0 0 0
Re: 100% CPU utilization, ParNew and never completing compactions
so a delete is really another write for gc_grace_seconds (default 10 days), if you get enough tombstones it can make managing your cluster a challenge as is. open up cqlsh, turn on tracing and try a few queries..how many tombstones are scanned for a given query? It's possible the heap problems you're seeing are actually happening on the query side and not on the ingest side, the severity of this depends on driver and cassandra version, but older drivers and versions of cassandra could easily overload heap with expensive selects, when layered over tombstones it's certainly becomes a possibility this is your root cause. Now this will primarily create more load on compaction and depending on your cassandra version there maybe some other issue at work, but something I can tell you is every time I see 1 dropped mutation I see a cluster that was overloaded enough it had to shed load. If I see 200k I see a cluster/configuration/hardware that is badly overloaded. I suggest the following - trace some of the queries used in prod - monitor your ingest rate, see at what levels you run into issues (GCInspector log messages, dropped mutations, etc) - heap configuration we mentioned earlier..go ahead and monitor heap usage, if it hits 75% repeated this is an indication of heavy load - monitor dropped mutations..any dropped mutation is evidence of an overloaded server, again the root cause can be many other problems that are solvable with current hardware, and LOTS of people runs with nodes with similar configuration. On Tue, Dec 16, 2014 at 5:08 PM, Arne Claassen a...@emotient.com wrote: Not using any secondary indicies and memtable_flush_queue_size is the default 4. But let me tell you how data is mutated right now, maybe that will give you an insight on how this is happening Basically the frame data table has the following primary key: PRIMARY KEY ((id), trackid, timestamp) Generally data is inserted once. So day to day writes are all new rows. However, when out process for generating analytics for these rows changes, we run the media back through again, causing overwrites. Up until last night, this was just a new insert because the PK never changed so it was always 1-to-1 overwrite of every row. Last night was the first time that a new change went in where the PK could actually change so now the process is always, DELETE by partition key, insert all rows for partition key, repeat. We two tables that have similar frame data projections and some other aggregates with much smaller row count per partition key. hope that helps, arne On Dec 16, 2014, at 2:46 PM, Ryan Svihla rsvi...@datastax.com wrote: so you've got some blocked flush writers but you have a incredibly large number of dropped mutations, are you using secondary indexes? and if so how many? what is your flush queue set to? On Tue, Dec 16, 2014 at 4:43 PM, Arne Claassen a...@emotient.com wrote: Of course QA decided to start a test batch (still relatively low traffic), so I hope it doesn't throw the tpstats off too much Node 1: Pool NameActive Pending Completed Blocked All time blocked MutationStage 0 0 13804928 0 0 ReadStage 0 0 10975 0 0 RequestResponseStage 0 07725378 0 0 ReadRepairStage 0 0 1247 0 0 ReplicateOnWriteStage 0 0 0 0 0 MiscStage 0 0 0 0 0 HintedHandoff 1 1 50 0 0 FlushWriter 0 0306 0 31 MemoryMeter 0 0719 0 0 GossipStage 0 0 286505 0 0 CacheCleanupExecutor 0 0 0 0 0 InternalResponseStage 0 0 0 0 0 CompactionExecutor414159 0 0 ValidationExecutor0 0 0 0 0 MigrationStage0 0 0 0 0 commitlog_archiver0 0 0 0 0 AntiEntropyStage 0 0 0 0 0 PendingRangeCalculator0 0 11 0 0 MemtablePostFlusher 0 0 1781 0 0 Message type Dropped READ 0 RANGE_SLICE 0 _TRACE
Re: 100% CPU utilization, ParNew and never completing compactions
I just did a wide set of selects and ran across no tombstones. But while on the subject of gc_grace_seconds, any reason, on a small cluster not to set it to something low like a single day. It seems like 10 days is only need to large clusters undergoing long partition splits, or am i misunderstanding gc_grace_seconds. Now, given all that, does any of this explain a high load when the cluster is idle? Is it compaction catching up and would manual forced compaction alleviate that? thanks, arne On Dec 16, 2014, at 3:28 PM, Ryan Svihla rsvi...@datastax.com wrote: so a delete is really another write for gc_grace_seconds (default 10 days), if you get enough tombstones it can make managing your cluster a challenge as is. open up cqlsh, turn on tracing and try a few queries..how many tombstones are scanned for a given query? It's possible the heap problems you're seeing are actually happening on the query side and not on the ingest side, the severity of this depends on driver and cassandra version, but older drivers and versions of cassandra could easily overload heap with expensive selects, when layered over tombstones it's certainly becomes a possibility this is your root cause. Now this will primarily create more load on compaction and depending on your cassandra version there maybe some other issue at work, but something I can tell you is every time I see 1 dropped mutation I see a cluster that was overloaded enough it had to shed load. If I see 200k I see a cluster/configuration/hardware that is badly overloaded. I suggest the following trace some of the queries used in prod monitor your ingest rate, see at what levels you run into issues (GCInspector log messages, dropped mutations, etc) heap configuration we mentioned earlier..go ahead and monitor heap usage, if it hits 75% repeated this is an indication of heavy load monitor dropped mutations..any dropped mutation is evidence of an overloaded server, again the root cause can be many other problems that are solvable with current hardware, and LOTS of people runs with nodes with similar configuration. On Tue, Dec 16, 2014 at 5:08 PM, Arne Claassen a...@emotient.com wrote: Not using any secondary indicies and memtable_flush_queue_size is the default 4. But let me tell you how data is mutated right now, maybe that will give you an insight on how this is happening Basically the frame data table has the following primary key: PRIMARY KEY ((id), trackid, timestamp) Generally data is inserted once. So day to day writes are all new rows. However, when out process for generating analytics for these rows changes, we run the media back through again, causing overwrites. Up until last night, this was just a new insert because the PK never changed so it was always 1-to-1 overwrite of every row. Last night was the first time that a new change went in where the PK could actually change so now the process is always, DELETE by partition key, insert all rows for partition key, repeat. We two tables that have similar frame data projections and some other aggregates with much smaller row count per partition key. hope that helps, arne On Dec 16, 2014, at 2:46 PM, Ryan Svihla rsvi...@datastax.com wrote: so you've got some blocked flush writers but you have a incredibly large number of dropped mutations, are you using secondary indexes? and if so how many? what is your flush queue set to? On Tue, Dec 16, 2014 at 4:43 PM, Arne Claassen a...@emotient.com wrote: Of course QA decided to start a test batch (still relatively low traffic), so I hope it doesn't throw the tpstats off too much Node 1: Pool NameActive Pending Completed Blocked All time blocked MutationStage 0 0 13804928 0 0 ReadStage 0 0 10975 0 0 RequestResponseStage 0 07725378 0 0 ReadRepairStage 0 0 1247 0 0 ReplicateOnWriteStage 0 0 0 0 0 MiscStage 0 0 0 0 0 HintedHandoff 1 1 50 0 0 FlushWriter 0 0306 0 31 MemoryMeter 0 0719 0 0 GossipStage 0 0 286505 0 0 CacheCleanupExecutor 0 0 0 0 0 InternalResponseStage 0 0 0 0 0 CompactionExecutor414159 0 0
Re: 100% CPU utilization, ParNew and never completing compactions
manual forced compactions create more problems than they solve, if you have no evidence of tombstones in your selects (which seems odd, can you share some of the tracing output?), then I'm not sure what it would solve for you. Compaction running could explain a high load, logs messages with ERRORS, WARN, GCInspector are all meaningful there, I suggest search jira for your version to see if there are any interesting bugs. On Tue, Dec 16, 2014 at 6:14 PM, Arne Claassen a...@emotient.com wrote: I just did a wide set of selects and ran across no tombstones. But while on the subject of gc_grace_seconds, any reason, on a small cluster not to set it to something low like a single day. It seems like 10 days is only need to large clusters undergoing long partition splits, or am i misunderstanding gc_grace_seconds. Now, given all that, does any of this explain a high load when the cluster is idle? Is it compaction catching up and would manual forced compaction alleviate that? thanks, arne On Dec 16, 2014, at 3:28 PM, Ryan Svihla rsvi...@datastax.com wrote: so a delete is really another write for gc_grace_seconds (default 10 days), if you get enough tombstones it can make managing your cluster a challenge as is. open up cqlsh, turn on tracing and try a few queries..how many tombstones are scanned for a given query? It's possible the heap problems you're seeing are actually happening on the query side and not on the ingest side, the severity of this depends on driver and cassandra version, but older drivers and versions of cassandra could easily overload heap with expensive selects, when layered over tombstones it's certainly becomes a possibility this is your root cause. Now this will primarily create more load on compaction and depending on your cassandra version there maybe some other issue at work, but something I can tell you is every time I see 1 dropped mutation I see a cluster that was overloaded enough it had to shed load. If I see 200k I see a cluster/configuration/hardware that is badly overloaded. I suggest the following - trace some of the queries used in prod - monitor your ingest rate, see at what levels you run into issues (GCInspector log messages, dropped mutations, etc) - heap configuration we mentioned earlier..go ahead and monitor heap usage, if it hits 75% repeated this is an indication of heavy load - monitor dropped mutations..any dropped mutation is evidence of an overloaded server, again the root cause can be many other problems that are solvable with current hardware, and LOTS of people runs with nodes with similar configuration. On Tue, Dec 16, 2014 at 5:08 PM, Arne Claassen a...@emotient.com wrote: Not using any secondary indicies and memtable_flush_queue_size is the default 4. But let me tell you how data is mutated right now, maybe that will give you an insight on how this is happening Basically the frame data table has the following primary key: PRIMARY KEY ((id), trackid, timestamp) Generally data is inserted once. So day to day writes are all new rows. However, when out process for generating analytics for these rows changes, we run the media back through again, causing overwrites. Up until last night, this was just a new insert because the PK never changed so it was always 1-to-1 overwrite of every row. Last night was the first time that a new change went in where the PK could actually change so now the process is always, DELETE by partition key, insert all rows for partition key, repeat. We two tables that have similar frame data projections and some other aggregates with much smaller row count per partition key. hope that helps, arne On Dec 16, 2014, at 2:46 PM, Ryan Svihla rsvi...@datastax.com wrote: so you've got some blocked flush writers but you have a incredibly large number of dropped mutations, are you using secondary indexes? and if so how many? what is your flush queue set to? On Tue, Dec 16, 2014 at 4:43 PM, Arne Claassen a...@emotient.com wrote: Of course QA decided to start a test batch (still relatively low traffic), so I hope it doesn't throw the tpstats off too much Node 1: Pool NameActive Pending Completed Blocked All time blocked MutationStage 0 0 13804928 0 0 ReadStage 0 0 10975 0 0 RequestResponseStage 0 07725378 0 0 ReadRepairStage 0 0 1247 0 0 ReplicateOnWriteStage 0 0 0 0 0 MiscStage 0 0 0 0 0 HintedHandoff 1 1 50 0 0 FlushWriter 0 0306 0
Questions about bootrapping and compactions during bootstrapping
Looking at the output of nodetool netstats I see that the bootstrapping nodes pulling from only two of the nine nodes currently in the datacenter. That surprises me: I'd think the vnodes it pulls from would be randomly spread across the existing nodes. We're using Cassandra 2.0.11 with 256 vnodes each. I also notice that while bootstrapping, the node is quite busy doing compactions. There are over 1000 pending compactions on the new node and it's not finished bootstrapping. I'd think those would be unnecessary, since the other nodes in the data center have zero pending compactions. Perhaps the compactions explains why running du -hs /var/lib/cassandra/data on the new node shows more disk space usage than on the old nodes. Is it reasonable to do nodetool disableautocompaction on the bootstrapping node? Should that be the default??? If I start bootstrapping one node, it's not yet in the cluster but it decides which token ranges it owns and requests streams for that data. If I then try to bootstrap a SECOND node concurrently, it will take over ownership of some token ranges from the first node. Will the first node then adjust what data it streams? It seems to me the cassandra server needs to keep track of both the OLD token ranges and vnodes and the NEW ones. I'm not convinced that running two bootstraps concurrently (starting the second one after several minutes of delay) is safe. Thanks, Don Donald A. Smith | Senior Software Engineer P: 425.201.3900 x 3866 C: (206) 819-5965 F: (646) 443-2333 dona...@audiencescience.commailto:dona...@audiencescience.com [AudienceScience]
Re: 100% CPU utilization, ParNew and never completing compactions
That's just the thing. There is nothing in the logs except the constant ParNew collections like DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is 8000634888 But the load is staying continuously high. There's always some compaction on just that one table, media_tracks_raw going on and those values rarely changed (certainly the remaining time is meaningless) pending tasks: 17 compaction typekeyspace table completed total unit progress Compaction mediamedia_tracks_raw 444294932 1310653468 bytes33.90% Compaction mediamedia_tracks_raw 131931354 3411631999 bytes 3.87% Compaction mediamedia_tracks_raw30308970 23097672194 bytes 0.13% Compaction mediamedia_tracks_raw 899216961 1815591081 bytes49.53% Active compaction remaining time : 0h27m56s Here's a sample of a query trace: activity | timestamp| source| source_elapsed --+--+---+ execute_cql3_query | 00:11:46,612 | 10.140.22.236 | 0 Parsing select * from media_tracks_raw where id =74fe9449-8ac4-accb-a723-4bad024101e3 limit 100; | 00:11:46,612 | 10.140.22.236 | 47 Preparing statement | 00:11:46,612 | 10.140.22.236 |234 Sending message to /10.140.21.54 | 00:11:46,619 | 10.140.22.236 | 7190 Message received from /10.140.22.236 | 00:11:46,622 | 10.140.21.54 | 12 Executing single-partition query on media_tracks_raw | 00:11:46,644 | 10.140.21.54 | 21971 Acquiring sstable references | 00:11:46,644 | 10.140.21.54 | 22029 Merging memtable tombstones | 00:11:46,644 | 10.140.21.54 | 22131 Bloom filter allows skipping sstable 1395 | 00:11:46,644 | 10.140.21.54 | 22245 Bloom filter allows skipping sstable 1394 | 00:11:46,644 | 10.140.21.54 | 22279 Bloom filter allows skipping sstable 1391 | 00:11:46,644 | 10.140.21.54 | 22293 Bloom filter allows skipping sstable 1381 | 00:11:46,644 | 10.140.21.54 | 22304 Bloom filter allows skipping sstable 1376 | 00:11:46,644 | 10.140.21.54 | 22317 Bloom filter allows skipping sstable 1368 | 00:11:46,644 | 10.140.21.54 | 22328 Bloom filter allows skipping sstable 1365 | 00:11:46,644 | 10.140.21.54 | 22340 Bloom filter allows skipping sstable 1351 | 00:11:46,644 | 10.140.21.54 | 22352 Bloom filter allows skipping sstable 1367 | 00:11:46,644 | 10.140.21.54 | 22363 Bloom filter allows skipping sstable 1380 | 00:11:46,644 | 10.140.21.54 | 22374 Bloom filter allows skipping sstable 1343 | 00:11:46,644 | 10.140.21.54 | 22386 Bloom filter allows skipping sstable 1342 | 00:11:46,644 | 10.140.21.54 | 22397 Bloom filter allows skipping sstable 1334 | 00:11:46,644 | 10.140.21.54 | 22408 Bloom filter allows skipping sstable 1377 | 00:11:46,644 | 10.140.21.54 | 22429 Bloom filter allows skipping sstable 1330 | 00:11:46,644 | 10.140.21.54 | 22441 Bloom filter allows skipping sstable 1329 | 00:11:46,644 | 10.140.21.54 | 22452 Bloom
Re: 100% CPU utilization, ParNew and never completing compactions
What version of Cassandra? On Dec 16, 2014 6:36 PM, Arne Claassen a...@emotient.com wrote: That's just the thing. There is nothing in the logs except the constant ParNew collections like DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is 8000634888 But the load is staying continuously high. There's always some compaction on just that one table, media_tracks_raw going on and those values rarely changed (certainly the remaining time is meaningless) pending tasks: 17 compaction typekeyspace table completed total unit progress Compaction mediamedia_tracks_raw 444294932 1310653468 bytes33.90% Compaction mediamedia_tracks_raw 131931354 3411631999 bytes 3.87% Compaction mediamedia_tracks_raw30308970 23097672194 bytes 0.13% Compaction mediamedia_tracks_raw 899216961 1815591081 bytes49.53% Active compaction remaining time : 0h27m56s Here's a sample of a query trace: activity | timestamp| source| source_elapsed --+--+---+ execute_cql3_query | 00:11:46,612 | 10.140.22.236 | 0 Parsing select * from media_tracks_raw where id =74fe9449-8ac4-accb-a723-4bad024101e3 limit 100; | 00:11:46,612 | 10.140.22.236 | 47 Preparing statement | 00:11:46,612 | 10.140.22.236 |234 Sending message to /10.140.21.54 | 00:11:46,619 | 10.140.22.236 | 7190 Message received from /10.140.22.236 | 00:11:46,622 | 10.140.21.54 | 12 Executing single-partition query on media_tracks_raw | 00:11:46,644 | 10.140.21.54 | 21971 Acquiring sstable references | 00:11:46,644 | 10.140.21.54 | 22029 Merging memtable tombstones | 00:11:46,644 | 10.140.21.54 | 22131 Bloom filter allows skipping sstable 1395 | 00:11:46,644 | 10.140.21.54 | 22245 Bloom filter allows skipping sstable 1394 | 00:11:46,644 | 10.140.21.54 | 22279 Bloom filter allows skipping sstable 1391 | 00:11:46,644 | 10.140.21.54 | 22293 Bloom filter allows skipping sstable 1381 | 00:11:46,644 | 10.140.21.54 | 22304 Bloom filter allows skipping sstable 1376 | 00:11:46,644 | 10.140.21.54 | 22317 Bloom filter allows skipping sstable 1368 | 00:11:46,644 | 10.140.21.54 | 22328 Bloom filter allows skipping sstable 1365 | 00:11:46,644 | 10.140.21.54 | 22340 Bloom filter allows skipping sstable 1351 | 00:11:46,644 | 10.140.21.54 | 22352 Bloom filter allows skipping sstable 1367 | 00:11:46,644 | 10.140.21.54 | 22363 Bloom filter allows skipping sstable 1380 | 00:11:46,644 | 10.140.21.54 | 22374 Bloom filter allows skipping sstable 1343 | 00:11:46,644 | 10.140.21.54 | 22386 Bloom filter allows skipping sstable 1342 | 00:11:46,644 | 10.140.21.54 | 22397 Bloom filter allows skipping sstable 1334 | 00:11:46,644 | 10.140.21.54 | 22408 Bloom filter allows skipping sstable 1377 | 00:11:46,644 | 10.140.21.54 | 22429 Bloom filter allows skipping sstable 1330 | 00:11:46,644 | 10.140.21.54 | 22441 Bloom filter allows skipping sstable 1329 | 00:11:46,644 | 10.140.21.54 | 22452 Bloom filter allows skipping sstable 1328 | 00:11:46,644 | 10.140.21.54 | 22463 Bloom filter allows skipping sstable 1327 | 00:11:46,644 | 10.140.21.54 | 22475
Re: 100% CPU utilization, ParNew and never completing compactions
Cassandra 2.0.10 and Datastax Java Driver 2.1.1 On Dec 16, 2014, at 4:48 PM, Ryan Svihla rsvi...@datastax.com wrote: What version of Cassandra? On Dec 16, 2014 6:36 PM, Arne Claassen a...@emotient.com wrote: That's just the thing. There is nothing in the logs except the constant ParNew collections like DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is 8000634888 But the load is staying continuously high. There's always some compaction on just that one table, media_tracks_raw going on and those values rarely changed (certainly the remaining time is meaningless) pending tasks: 17 compaction typekeyspace table completed total unit progress Compaction mediamedia_tracks_raw 444294932 1310653468 bytes33.90% Compaction mediamedia_tracks_raw 131931354 3411631999 bytes 3.87% Compaction mediamedia_tracks_raw30308970 23097672194 bytes 0.13% Compaction mediamedia_tracks_raw 899216961 1815591081 bytes49.53% Active compaction remaining time : 0h27m56s Here's a sample of a query trace: activity | timestamp| source| source_elapsed --+--+---+ execute_cql3_query | 00:11:46,612 | 10.140.22.236 | 0 Parsing select * from media_tracks_raw where id =74fe9449-8ac4-accb-a723-4bad024101e3 limit 100; | 00:11:46,612 | 10.140.22.236 | 47 Preparing statement | 00:11:46,612 | 10.140.22.236 |234 Sending message to /10.140.21.54 | 00:11:46,619 | 10.140.22.236 | 7190 Message received from /10.140.22.236 | 00:11:46,622 | 10.140.21.54 | 12 Executing single-partition query on media_tracks_raw | 00:11:46,644 | 10.140.21.54 | 21971 Acquiring sstable references | 00:11:46,644 | 10.140.21.54 | 22029 Merging memtable tombstones | 00:11:46,644 | 10.140.21.54 | 22131 Bloom filter allows skipping sstable 1395 | 00:11:46,644 | 10.140.21.54 | 22245 Bloom filter allows skipping sstable 1394 | 00:11:46,644 | 10.140.21.54 | 22279 Bloom filter allows skipping sstable 1391 | 00:11:46,644 | 10.140.21.54 | 22293 Bloom filter allows skipping sstable 1381 | 00:11:46,644 | 10.140.21.54 | 22304 Bloom filter allows skipping sstable 1376 | 00:11:46,644 | 10.140.21.54 | 22317 Bloom filter allows skipping sstable 1368 | 00:11:46,644 | 10.140.21.54 | 22328 Bloom filter allows skipping sstable 1365 | 00:11:46,644 | 10.140.21.54 | 22340 Bloom filter allows skipping sstable 1351 | 00:11:46,644 | 10.140.21.54 | 22352 Bloom filter allows skipping sstable 1367 | 00:11:46,644 | 10.140.21.54 | 22363 Bloom filter allows skipping sstable 1380 | 00:11:46,644 | 10.140.21.54 | 22374 Bloom filter allows skipping sstable 1343 | 00:11:46,644 | 10.140.21.54 | 22386 Bloom filter allows skipping sstable 1342 | 00:11:46,644 | 10.140.21.54 | 22397 Bloom filter allows skipping sstable 1334 | 00:11:46,644 | 10.140.21.54 | 22408 Bloom filter allows skipping sstable 1377 | 00:11:46,644 | 10.140.21.54 | 22429 Bloom
[Consitency on cqlsh command prompt]
Hi, When I set Consistency to QUORUM in cqlsh command line. It says consistency is set to quorum. cqlsh:testdb CONSISTENCY QUORUM ; Consistency level set to QUORUM. However when I check it back using CONSISTENCY command on the prompt it says consistency is 4. However it should be 2 as my replication factor for the keyspace is 3. cqlsh:testdb CONSISTENCY ; Current consistency level is 4. Isn't consistency QUORUM calculated by: (replication_factor/2)+1? Where replication_factor/2 is rounded down. If yes then why consistency is displayed as 4, however it should be 2 (3/2 = 1.5 = 1)+1 = 2. I am using Casssandra version 2.1.2 and cqlsh 5.0.1 and CQL spec 3.2.0 Thanks! in advance. Nitin Padalia
Re: 100% CPU utilization, ParNew and never completing compactions
Maybe checking which thread(s) would hint what's going on? (see http://www.boxjar.com/using-top-and-jstack-to-find-the-java-thread-that-is-hogging-the-cpu/). On Wed, Dec 17, 2014 at 1:51 AM, Arne Claassen a...@emotient.com wrote: Cassandra 2.0.10 and Datastax Java Driver 2.1.1 On Dec 16, 2014, at 4:48 PM, Ryan Svihla rsvi...@datastax.com wrote: What version of Cassandra? On Dec 16, 2014 6:36 PM, Arne Claassen a...@emotient.com wrote: That's just the thing. There is nothing in the logs except the constant ParNew collections like DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is 8000634888 But the load is staying continuously high. There's always some compaction on just that one table, media_tracks_raw going on and those values rarely changed (certainly the remaining time is meaningless) pending tasks: 17 compaction typekeyspace table completed total unit progress Compaction mediamedia_tracks_raw 444294932 1310653468 bytes33.90% Compaction mediamedia_tracks_raw 131931354 3411631999 bytes 3.87% Compaction mediamedia_tracks_raw30308970 23097672194 bytes 0.13% Compaction mediamedia_tracks_raw 899216961 1815591081 bytes49.53% Active compaction remaining time : 0h27m56s Here's a sample of a query trace: activity | timestamp| source| source_elapsed --+--+---+ execute_cql3_query | 00:11:46,612 | 10.140.22.236 | 0 Parsing select * from media_tracks_raw where id =74fe9449-8ac4-accb-a723-4bad024101e3 limit 100; | 00:11:46,612 | 10.140.22.236 | 47 Preparing statement | 00:11:46,612 | 10.140.22.236 |234 Sending message to /10.140.21.54 | 00:11:46,619 | 10.140.22.236 | 7190 Message received from /10.140.22.236 | 00:11:46,622 | 10.140.21.54 | 12 Executing single-partition query on media_tracks_raw | 00:11:46,644 | 10.140.21.54 | 21971 Acquiring sstable references | 00:11:46,644 | 10.140.21.54 | 22029 Merging memtable tombstones | 00:11:46,644 | 10.140.21.54 | 22131 Bloom filter allows skipping sstable 1395 | 00:11:46,644 | 10.140.21.54 | 22245 Bloom filter allows skipping sstable 1394 | 00:11:46,644 | 10.140.21.54 | 22279 Bloom filter allows skipping sstable 1391 | 00:11:46,644 | 10.140.21.54 | 22293 Bloom filter allows skipping sstable 1381 | 00:11:46,644 | 10.140.21.54 | 22304 Bloom filter allows skipping sstable 1376 | 00:11:46,644 | 10.140.21.54 | 22317 Bloom filter allows skipping sstable 1368 | 00:11:46,644 | 10.140.21.54 | 22328 Bloom filter allows skipping sstable 1365 | 00:11:46,644 | 10.140.21.54 | 22340 Bloom filter allows skipping sstable 1351 | 00:11:46,644 | 10.140.21.54 | 22352 Bloom filter allows skipping sstable 1367 | 00:11:46,644 | 10.140.21.54 | 22363 Bloom filter allows skipping sstable 1380 | 00:11:46,644 | 10.140.21.54 | 22374 Bloom filter allows skipping sstable 1343 | 00:11:46,644 | 10.140.21.54 | 22386 Bloom filter allows skipping sstable 1342 | 00:11:46,644 | 10.140.21.54 | 22397 Bloom filter allows skipping sstable 1334 | 00:11:46,644 |