Re: index_interval
I would also optimize for your worst case, which is hitting zero caches. If you're using the default settings when creating a table, you're going to get compression settings that are terrible for reads. If you've got memory to spare, I suggest changing your chunk_length_in_kb to 4 and disabling readahead on your drives entirely. I've seen 50-100x improvement in read latency and throughput just by changing those settings. I just did a talk on this topic last week, slides are here: https://www.slideshare.net/JonHaddad/performance-tuning-86995333 Jon On Wed, Jul 12, 2017 at 2:03 PM Jeff Jirsa <jji...@apache.org> wrote: > > > On 2017-07-12 12:03 (-0700), Fay Hou [Storage Service] < > fay...@coupang.com> wrote: > > First, a big thank to Jeff who spent endless time to help this mailing > list. > > Agreed that we should tune the key cache. In my case, my key cache hit > rate > > is about 20%. mainly because we do random read. We just going to leave > the > > index_interval as is for now. > > > > That's pretty painful. If you can up that a bit, it'll probably help you > out. You can adjust the index intervals, too, but I'd significantly > increase key cache size first if it were my cluster. > > > - > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > >
Re: index_interval
On 2017-07-12 12:03 (-0700), Fay Hou [Storage Service] Â <fay...@coupang.com> wrote: > First, a big thank to Jeff who spent endless time to help this mailing list. > Agreed that we should tune the key cache. In my case, my key cache hit rate > is about 20%. mainly because we do random read. We just going to leave the > index_interval as is for now. > That's pretty painful. If you can up that a bit, it'll probably help you out. You can adjust the index intervals, too, but I'd significantly increase key cache size first if it were my cluster. - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: index_interval
First, a big thank to Jeff who spent endless time to help this mailing list. Agreed that we should tune the key cache. In my case, my key cache hit rate is about 20%. mainly because we do random read. We just going to leave the index_interval as is for now. On Mon, Jul 10, 2017 at 8:47 PM, Jeff Jirsa <jji...@apache.org> wrote: > > > On 2017-07-10 15:09 (-0700), Fay Hou [Storage Service] < > fay...@coupang.com> wrote: > > BY defaults: > > > > AND max_index_interval = 2048 > > AND memtable_flush_period_in_ms = 0 > > AND min_index_interval = 128 > > > > "Cassandra maintains index offsets per partition to speed up the lookup > > process in the case of key cache misses (see cassandra read path overview > > <http://docs.datastax.com/en/cassandra/2.1/cassandra/dml/ > dml_about_reads_c.html>). > > By default it samples a subset of keys, somewhat similar to a skip list. > > The sampling interval is configurable with min_index_interval and > > max_index_interval CQL schema attributes (see describe table). For > > relatively large blobs like HTML pages we seem to get better read > latencies > > by lowering the sampling interval from 128 min / 2048 max to 64 min / 512 > > max. For large tables like parsoid HTML with ~500G load per node this > > change adds a modest ~25mb off-heap memory." > > > > I wonder if any one has experience on working with max and min > index_interval > > to increase the read speed. > > It's usually more efficient to try to tune the key cache, and hope you > never have to hit the partition index at all. Do you have reason to believe > you're spending an inordinate amount of IO scanning the partition index? Do > you know what your key cache hit rate is? > > > - > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > >
Re: index_interval
On 2017-07-10 15:09 (-0700), Fay Hou [Storage Service] Â <fay...@coupang.com> wrote: > BY defaults: > > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > > "Cassandra maintains index offsets per partition to speed up the lookup > process in the case of key cache misses (see cassandra read path overview > <http://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dml_about_reads_c.html>). > By default it samples a subset of keys, somewhat similar to a skip list. > The sampling interval is configurable with min_index_interval and > max_index_interval CQL schema attributes (see describe table). For > relatively large blobs like HTML pages we seem to get better read latencies > by lowering the sampling interval from 128 min / 2048 max to 64 min / 512 > max. For large tables like parsoid HTML with ~500G load per node this > change adds a modest ~25mb off-heap memory." > > I wonder if any one has experience on working with max and min index_interval > to increase the read speed. It's usually more efficient to try to tune the key cache, and hope you never have to hit the partition index at all. Do you have reason to believe you're spending an inordinate amount of IO scanning the partition index? Do you know what your key cache hit rate is? - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
index_interval
BY defaults: AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 "Cassandra maintains index offsets per partition to speed up the lookup process in the case of key cache misses (see cassandra read path overview <http://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dml_about_reads_c.html>). By default it samples a subset of keys, somewhat similar to a skip list. The sampling interval is configurable with min_index_interval and max_index_interval CQL schema attributes (see describe table). For relatively large blobs like HTML pages we seem to get better read latencies by lowering the sampling interval from 128 min / 2048 max to 64 min / 512 max. For large tables like parsoid HTML with ~500G load per node this change adds a modest ~25mb off-heap memory." I wonder if any one has experience on working with max and min index_interval to increase the read speed. Thanks, Fay
Re: index_interval
On Mon, May 13, 2013 at 9:19 PM, Bryan Talbot btal...@aeriagames.com wrote: Can the index sample storage be treated more like key cache or row cache where the total space used can be limited to something less than all available system ram, and space is recycled using an LRU (or configurable) algorithm? Treating it with LRU doesn't seem to make that much sense, but there's seemingly-trivial ways to prune an Index Sample [1] like delete-every-other-key. Brief conversation with driftx suggests a lack of enthusiasm for the scale of win potential from active pruning of the Index Sample, especially given the relative size of bloom filters compared to the Index Sample. However if you are interested in this as a potential improvement, feel free to file a JIRA! :D =Rob [1] New terminology Partition Summary per jbellis keynote @ summit2013
Re: index_interval
From: Robert Coli rc...@eventbrite.com To: user@cassandra.apache.org Sent: Monday, June 17, 2013 3:28 PM Subject: Re: index_interval On Mon, May 13, 2013 at 9:19 PM, Bryan Talbot btal...@aeriagames.com wrote: Can the index sample storage be treated more like key cache or row cache where the total space used can be limited to something less than all available system ram, and space is recycled using an LRU (or configurable) algorithm? Treating it with LRU doesn't seem to make that much sense, but there's seemingly-trivial ways to prune an Index Sample [1] like delete-every-other-key. Brief conversation with driftx suggests a lack of enthusiasm for the scale of win potential from active pruning of the Index Sample, especially given the relative size of bloom filters compared to the Index Sample. However if you are interested in this as a potential improvement, feel free to file a JIRA! :D =Rob [1] New terminology Partition Summary per jbellis keynote @ summit2013
Re: index_interval
So will cassandra provide a way to limit its off-heap usage to avoid unexpected OOM kills? I'd much rather have performance degrade when 100% of the index samples no longer fit in memory rather than the process being killed with no way to stabilize it without adding hardware or removing data. -Bryan On Fri, May 10, 2013 at 7:44 PM, Edward Capriolo edlinuxg...@gmail.comwrote: If you use your off heap memory linux has an OOM killer, that will kill a random tasks. On Fri, May 10, 2013 at 11:34 AM, Bryan Talbot btal...@aeriagames.comwrote: If off-heap memory (for indes samples, bloom filters, row caches, key caches, etc) is exhausted, will cassandra experience a memory allocation error and quit? If so, are there plans to make the off-heap usage more dynamic to allow less used pages to be replaced with hot data and the paged-out / cold data read back in again on demand?
Re: index_interval
If off-heap memory (for indes samples, bloom filters, row caches, key caches, etc) is exhausted, will cassandra experience a memory allocation error and quit? If so, are there plans to make the off-heap usage more dynamic to allow less used pages to be replaced with hot data and the paged-out / cold data read back in again on demand? -Bryan On Wed, May 8, 2013 at 4:24 PM, Jonathan Ellis jbel...@gmail.com wrote: index_interval won't be going away, but you won't need to change it as often in 2.0: https://issues.apache.org/jira/browse/CASSANDRA-5521 On Mon, May 6, 2013 at 12:27 PM, Hiller, Dean dean.hil...@nrel.gov wrote: I heard a rumor that index_interval is going away? What is the replacement for this? (we have been having to play with this setting a lot lately as too big and it gets slow yet too small and cassandra uses way too much RAM…we are still trying to find the right balance with this setting). Thanks, Dean -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: index_interval
If you use your off heap memory linux has an OOM killer, that will kill a random tasks. On Fri, May 10, 2013 at 11:34 AM, Bryan Talbot btal...@aeriagames.comwrote: If off-heap memory (for indes samples, bloom filters, row caches, key caches, etc) is exhausted, will cassandra experience a memory allocation error and quit? If so, are there plans to make the off-heap usage more dynamic to allow less used pages to be replaced with hot data and the paged-out / cold data read back in again on demand? -Bryan On Wed, May 8, 2013 at 4:24 PM, Jonathan Ellis jbel...@gmail.com wrote: index_interval won't be going away, but you won't need to change it as often in 2.0: https://issues.apache.org/jira/browse/CASSANDRA-5521 On Mon, May 6, 2013 at 12:27 PM, Hiller, Dean dean.hil...@nrel.gov wrote: I heard a rumor that index_interval is going away? What is the replacement for this? (we have been having to play with this setting a lot lately as too big and it gets slow yet too small and cassandra uses way too much RAM…we are still trying to find the right balance with this setting). Thanks, Dean -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: index_interval
index_interval won't be going away, but you won't need to change it as often in 2.0: https://issues.apache.org/jira/browse/CASSANDRA-5521 On Mon, May 6, 2013 at 12:27 PM, Hiller, Dean dean.hil...@nrel.gov wrote: I heard a rumor that index_interval is going away? What is the replacement for this? (we have been having to play with this setting a lot lately as too big and it gets slow yet too small and cassandra uses way too much RAM…we are still trying to find the right balance with this setting). Thanks, Dean -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: index_interval
This is the closest I can find in Jira https://issues.apache.org/jira/browse/CASSANDRA-4478 It's a pretty handy tool to have in your tool kit, specially when you start to have over 1 billion rows per node. A - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 7/05/2013, at 5:27 AM, Hiller, Dean dean.hil...@nrel.gov wrote: I heard a rumor that index_interval is going away? What is the replacement for this? (we have been having to play with this setting a lot lately as too big and it gets slow yet too small and cassandra uses way too much RAM…we are still trying to find the right balance with this setting). Thanks, Dean
Re: index_interval file size is the same after modifying 128 to 512?
Dean, as I can see you are satisfied with the result of increasing ii from 128 to 512, didn't you observed any drawbacks of this change? I remember you mentioned no change in Read Latency and a significant drop of heap size, but did you check any other metrics? I did the opposite (512 - 128; before we've had problems with heap size, now we can revert it, so I check if it makes sense) and I do not see almost any difference in Read Latency too, but I can see that the number of dropped READ messages has decreased significantly (it's 1 or even 2 orders of magnitude lower for the nodes I set ii = 128 comparing to the nodes with ii = 512; the exact value is about 0.005 / sec. comparing to about 0.01 - 0.2 for other nodes) and I have much less connection resets reported by netstat's Munin plugin. In other words, as I understand it - there's much less timeouts which should improve overall C* performance, even if I can't see it in read latency graph for CFs (unluckily I don't have a graph for StorageProxy latencies to easily check it). To make sure about the reason of this differences and its effect on C* performance, I'm looking for some references in other people's experience / observations :-) M. W dniu 22.03.2013 17:17, Hiller, Dean pisze: I was just curious. Our RAM has significantly reduced but the *Index.db files are the same size size as before. Any ideas why this would be the case? Basically, Why is our disk size not reduced since RAM is way lower? We are running strong now with 512 index_interval for past 2-3 days and RAM never looked better. We were pushing 10G before and now we are 2G slowing increasing to 8G before gc compacts the long lived stuff which goes back down to 2G again…..very pleased with LCS in our system! Thanks, Dean
Re: index_interval file size is the same after modifying 128 to 512?
We only look at our program's response time at the high level and have a scatter plot. The scatter plot shows no real differences so even though what you say may be true, our end users are not seeing any differences. I have not checked the any further because the high level use cases look great. Dean On 3/26/13 2:35 AM, Michal Michalski mich...@opera.com wrote: Dean, as I can see you are satisfied with the result of increasing ii from 128 to 512, didn't you observed any drawbacks of this change? I remember you mentioned no change in Read Latency and a significant drop of heap size, but did you check any other metrics? I did the opposite (512 - 128; before we've had problems with heap size, now we can revert it, so I check if it makes sense) and I do not see almost any difference in Read Latency too, but I can see that the number of dropped READ messages has decreased significantly (it's 1 or even 2 orders of magnitude lower for the nodes I set ii = 128 comparing to the nodes with ii = 512; the exact value is about 0.005 / sec. comparing to about 0.01 - 0.2 for other nodes) and I have much less connection resets reported by netstat's Munin plugin. In other words, as I understand it - there's much less timeouts which should improve overall C* performance, even if I can't see it in read latency graph for CFs (unluckily I don't have a graph for StorageProxy latencies to easily check it). To make sure about the reason of this differences and its effect on C* performance, I'm looking for some references in other people's experience / observations :-) M. W dniu 22.03.2013 17:17, Hiller, Dean pisze: I was just curious. Our RAM has significantly reduced but the *Index.db files are the same size size as before. Any ideas why this would be the case? Basically, Why is our disk size not reduced since RAM is way lower? We are running strong now with 512 index_interval for past 2-3 days and RAM never looked better. We were pushing 10G before and now we are 2G slowing increasing to 8G before gc compacts the long lived stuff which goes back down to 2G againŠ..very pleased with LCS in our system! Thanks, Dean
index_interval file size is the same after modifying 128 to 512?
I was just curious. Our RAM has significantly reduced but the *Index.db files are the same size size as before. Any ideas why this would be the case? Basically, Why is our disk size not reduced since RAM is way lower? We are running strong now with 512 index_interval for past 2-3 days and RAM never looked better. We were pushing 10G before and now we are 2G slowing increasing to 8G before gc compacts the long lived stuff which goes back down to 2G again…..very pleased with LCS in our system! Thanks, Dean
Re: index_interval file size is the same after modifying 128 to 512?
Index.db file always contains *all* position of the keys in data file. index_interval is the rate that the position of the key in index file is store in memory. So that C* can begin scanning index file from closest position. On Friday, March 22, 2013 at 11:17 AM, Hiller, Dean wrote: I was just curious. Our RAM has significantly reduced but the *Index.db files are the same size size as before. Any ideas why this would be the case? Basically, Why is our disk size not reduced since RAM is way lower? We are running strong now with 512 index_interval for past 2-3 days and RAM never looked better. We were pushing 10G before and now we are 2G slowing increasing to 8G before gc compacts the long lived stuff which goes back down to 2G again…..very pleased with LCS in our system! Thanks, Dean
Re: index_interval memory savings in our case(if you are curious)Š (and performance result)...
Wow. SO LCS with bloom filter fp chance of 0.1 and an index sampling rate of 512 on a column family of 1.7billion rows each node yields 100% result on first sstable reads? That sounds amazing. And I assume this is cfhistograms output from a node that has been on 512 for a while? ( I still think its unlikely 1.2x re-samples sstables on startup -- I'm on on 1.1x though ) For LCS, same fp chance and sampling rate, with 300-500mil rows per node ( 300-400GB ) on 1.1x my sstable reads for a single read got pretty much out of control. On 20/03/13 14:35, Hiller, Dean dean.hil...@nrel.gov wrote: I am using LCS so bloom filter fp default for 1.2.2 is 0.1 so my bloomfilter size is 1.27G RAM(nodetool cfstats)1.7 billion rows each node. My cfstats for this CF is attached(Since cut and paste screwed up the formatting). During testing in QA, we were not sure if index_interval change was working so we dug into the code to find out, it basically seems to immediately convert on startup though doesn't log anything except at a debug level which we don't have on. Dean On 3/20/13 6:58 AM, Andras Szerdahelyi andras.szerdahe...@ignitionone.com wrote: I am curious, thanks. ( I am in the same situation, big nodes choking under 300-400G data load, 500mil keys ) How does your cfhistograms Keyspace CF output look like? How many sstable reads ? What is your bloom filter fp chance ? Regards, Andras On 20/03/13 13:54, Hiller, Dean dean.hil...@nrel.gov wrote: Oh, and to give you an idea of memory savings, we had a node at 10G RAM usage...we had upped a few nodes to 16G from 8G as we don't have our new nodes ready yet(we know we should be at 8G but we would have a dead cluster if we did that). On startup, the initial RAM is around 6-8G. Startup with index_interval=512 resulted in a 2.5G-2.8G initial RAM and I have seen it grow to 3.3G and back down to 2.8G. We just rolled this out an hour ago. Our website response time is the same as before as well. We rolled to only 2 nodes(out of 6) in our cluster so far to test it out and let it soak a bit. We will slowly roll to more nodes monitoring the performance as we go. Also, since dynamic snitch is not working with SimpleSnitch, we know that just one slow node affects our website(from personal pain/experience of nodes hitting RAM limit and slowing down causing website to get real slow). Dean On 3/20/13 6:41 AM, Andras Szerdahelyi andras.szerdahe...@ignitionone.com wrote: 2. Upping index_interval from 128 to 512 (this seemed to reduce our memory usage significantly!!!) I'd be very careful with that as a one-stop improvement solution for two reasons AFAIK 1) you have to rebuild stables ( not an issue if you are evaluating, doing test writes.. Etc, not so much in production ) 2) it can affect reads ( number of sstable reads to serve a read ) especially if your key/row cache is ineffective On 20/03/13 13:34, Hiller, Dean dean.hil...@nrel.gov wrote: Also, look at the cassandra logs. I bet you see the typicalŠblah blah is at 0.85, doing memory cleanup which is not exactly GC but cassandra memory managementŠ..and of course, you have GC on top of that. If you need to get your memory down, there are multiple ways 1. Switching size tiered compaction to leveled compaction(with 1 billion narrow rows, this helped us quite a bit) 2. Upping index_interval from 128 to 512 (this seemed to reduce our memory usage significantly!!!) 3. Just add more nodes as moving the rows to other servers reduces memory from #1 and #2 above since the server would have less rows Later, Dean On 3/20/13 6:29 AM, Andras Szerdahelyi andras.szerdahe...@ignitionone.com wrote: I'd say GC. Please fill in form CASS-FREEZE-001 below and get back to us :-) ( sorry ) How big is your JVM heap ? How many CPUs ? Garbage collection taking long ? ( look for log lines from GCInspector) Running out of heap ? ( heap is .. full log lines ) Any tasks backing up / being dropped ? ( nodetool tpstats and .. dropped in last .. ms log lines ) Are writes really slow? ( nodetool cfhistograms Keyspace ColumnFamily ) How much is lots of data? Wide or skinny rows? Mutations/sec ? Which Compaction Strategy are you using? Output of show schema ( cassandra-cli ) for the relevant Keyspace/CF might help as well What consistency are you doing your writes with ? I assume ONE or ANY if you have a single node. What are the values for these settings in cassandra.yaml memtable_total_space_in_mb: memtable_flush_writers: memtable_flush_queue_size: compaction_throughput_mb_per_sec: concurrent_writes: Which version of Cassandra? Regards, Andras From: Joel Samuelsson samuelsson.j...@gmail.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Wednesday 20 March 2013 13:06 To: user@cassandra.apache.org user@cassandra.apache.org Subject: Cassandra freezes Hello, I've been trying to load test a one node cassandra cluster. When I add lots of data, the Cassandra node freezes for 4-5 minutes during which
Re: index_interval memory savings in our case(if you are curious)Š (and performance result)...
Dean, what is your row size approximately? We've been using ii = 512 for a long time because of memory issues, but now - as bloom filter is kept off-heap and memory is not an issue anymore - I've reverted it to 128 to see if this improves anything. It seems it doesn't (except that I have less connections resets reported by Munin's netstat plugin, but I'm not 100% sure if it's related to lower ii, as I don't really believe that disk scan delay difference with ii = 512 may be so huge to timeout connections), but I'm just curious how far are we from the point where it will matter to know if this might be an issue soon (our rows are growing in time - not very fast, but they do), so I'm looking for some reference / comparison ;-) Currently, according to cfhistograms, vast majority (~70%) of our rows' size is up to 20KB and the rest is up to 50KB. I wonder if it's the size that really matters in terms of ii value. M. W dniu 20.03.2013 13:54, Hiller, Dean pisze: Oh, and to give you an idea of memory savings, we had a node at 10G RAM usage...we had upped a few nodes to 16G from 8G as we don't have our new nodes ready yet(we know we should be at 8G but we would have a dead cluster if we did that). On startup, the initial RAM is around 6-8G. Startup with index_interval=512 resulted in a 2.5G-2.8G initial RAM and I have seen it grow to 3.3G and back down to 2.8G. We just rolled this out an hour ago. Our website response time is the same as before as well. We rolled to only 2 nodes(out of 6) in our cluster so far to test it out and let it soak a bit. We will slowly roll to more nodes monitoring the performance as we go. Also, since dynamic snitch is not working with SimpleSnitch, we know that just one slow node affects our website(from personal pain/experience of nodes hitting RAM limit and slowing down causing website to get real slow). Dean On 3/20/13 6:41 AM, Andras Szerdahelyi andras.szerdahe...@ignitionone.com wrote: 2. Upping index_interval from 128 to 512 (this seemed to reduce our memory usage significantly!!!) I'd be very careful with that as a one-stop improvement solution for two reasons AFAIK 1) you have to rebuild stables ( not an issue if you are evaluating, doing test writes.. Etc, not so much in production ) 2) it can affect reads ( number of sstable reads to serve a read ) especially if your key/row cache is ineffective On 20/03/13 13:34, Hiller, Dean dean.hil...@nrel.gov wrote: Also, look at the cassandra logs. I bet you see the typicalŠblah blah is at 0.85, doing memory cleanup which is not exactly GC but cassandra memory managementŠ..and of course, you have GC on top of that. If you need to get your memory down, there are multiple ways 1. Switching size tiered compaction to leveled compaction(with 1 billion narrow rows, this helped us quite a bit) 2. Upping index_interval from 128 to 512 (this seemed to reduce our memory usage significantly!!!) 3. Just add more nodes as moving the rows to other servers reduces memory from #1 and #2 above since the server would have less rows Later, Dean On 3/20/13 6:29 AM, Andras Szerdahelyi andras.szerdahe...@ignitionone.com wrote: I'd say GC. Please fill in form CASS-FREEZE-001 below and get back to us :-) ( sorry ) How big is your JVM heap ? How many CPUs ? Garbage collection taking long ? ( look for log lines from GCInspector) Running out of heap ? ( heap is .. full log lines ) Any tasks backing up / being dropped ? ( nodetool tpstats and .. dropped in last .. ms log lines ) Are writes really slow? ( nodetool cfhistograms Keyspace ColumnFamily ) How much is lots of data? Wide or skinny rows? Mutations/sec ? Which Compaction Strategy are you using? Output of show schema ( cassandra-cli ) for the relevant Keyspace/CF might help as well What consistency are you doing your writes with ? I assume ONE or ANY if you have a single node. What are the values for these settings in cassandra.yaml memtable_total_space_in_mb: memtable_flush_writers: memtable_flush_queue_size: compaction_throughput_mb_per_sec: concurrent_writes: Which version of Cassandra? Regards, Andras From: Joel Samuelsson samuelsson.j...@gmail.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Wednesday 20 March 2013 13:06 To: user@cassandra.apache.org user@cassandra.apache.org Subject: Cassandra freezes Hello, I've been trying to load test a one node cassandra cluster. When I add lots of data, the Cassandra node freezes for 4-5 minutes during which neither reads nor writes are served. During this time, Cassandra takes 100% of a single CPU core. My initial thought was that this was Cassandra flushing memtables to the disk, however, the disk i/o is very low during this time. Any idea what my problem could be? I'm running in a virtual environment in which I have no control of drives. So commit log and data directory is (probably) on the same drive. Best regards, Joel Samuelsson
Re: index_interval memory savings in our case(if you are curious)Š (and performance result)...
Argh, now I think that row size has nothing to do with the ii-based index size/efficiency (I was thinking about the need of reading index_interval / 2 entries in average from index file before finding the proper one, but it should not have nothing to do with row size) - forget the question; need to get a second coffee ;-) M. W dniu 21.03.2013 09:29, Michal Michalski pisze: Dean, what is your row size approximately? We've been using ii = 512 for a long time because of memory issues, but now - as bloom filter is kept off-heap and memory is not an issue anymore - I've reverted it to 128 to see if this improves anything. It seems it doesn't (except that I have less connections resets reported by Munin's netstat plugin, but I'm not 100% sure if it's related to lower ii, as I don't really believe that disk scan delay difference with ii = 512 may be so huge to timeout connections), but I'm just curious how far are we from the point where it will matter to know if this might be an issue soon (our rows are growing in time - not very fast, but they do), so I'm looking for some reference / comparison ;-) Currently, according to cfhistograms, vast majority (~70%) of our rows' size is up to 20KB and the rest is up to 50KB. I wonder if it's the size that really matters in terms of ii value. M. W dniu 20.03.2013 13:54, Hiller, Dean pisze: Oh, and to give you an idea of memory savings, we had a node at 10G RAM usage...we had upped a few nodes to 16G from 8G as we don't have our new nodes ready yet(we know we should be at 8G but we would have a dead cluster if we did that). On startup, the initial RAM is around 6-8G. Startup with index_interval=512 resulted in a 2.5G-2.8G initial RAM and I have seen it grow to 3.3G and back down to 2.8G. We just rolled this out an hour ago. Our website response time is the same as before as well. We rolled to only 2 nodes(out of 6) in our cluster so far to test it out and let it soak a bit. We will slowly roll to more nodes monitoring the performance as we go. Also, since dynamic snitch is not working with SimpleSnitch, we know that just one slow node affects our website(from personal pain/experience of nodes hitting RAM limit and slowing down causing website to get real slow). Dean On 3/20/13 6:41 AM, Andras Szerdahelyi andras.szerdahe...@ignitionone.com wrote: 2. Upping index_interval from 128 to 512 (this seemed to reduce our memory usage significantly!!!) I'd be very careful with that as a one-stop improvement solution for two reasons AFAIK 1) you have to rebuild stables ( not an issue if you are evaluating, doing test writes.. Etc, not so much in production ) 2) it can affect reads ( number of sstable reads to serve a read ) especially if your key/row cache is ineffective On 20/03/13 13:34, Hiller, Dean dean.hil...@nrel.gov wrote: Also, look at the cassandra logs. I bet you see the typicalŠblah blah is at 0.85, doing memory cleanup which is not exactly GC but cassandra memory managementŠ..and of course, you have GC on top of that. If you need to get your memory down, there are multiple ways 1. Switching size tiered compaction to leveled compaction(with 1 billion narrow rows, this helped us quite a bit) 2. Upping index_interval from 128 to 512 (this seemed to reduce our memory usage significantly!!!) 3. Just add more nodes as moving the rows to other servers reduces memory from #1 and #2 above since the server would have less rows Later, Dean On 3/20/13 6:29 AM, Andras Szerdahelyi andras.szerdahe...@ignitionone.com wrote: I'd say GC. Please fill in form CASS-FREEZE-001 below and get back to us :-) ( sorry ) How big is your JVM heap ? How many CPUs ? Garbage collection taking long ? ( look for log lines from GCInspector) Running out of heap ? ( heap is .. full log lines ) Any tasks backing up / being dropped ? ( nodetool tpstats and .. dropped in last .. ms log lines ) Are writes really slow? ( nodetool cfhistograms Keyspace ColumnFamily ) How much is lots of data? Wide or skinny rows? Mutations/sec ? Which Compaction Strategy are you using? Output of show schema ( cassandra-cli ) for the relevant Keyspace/CF might help as well What consistency are you doing your writes with ? I assume ONE or ANY if you have a single node. What are the values for these settings in cassandra.yaml memtable_total_space_in_mb: memtable_flush_writers: memtable_flush_queue_size: compaction_throughput_mb_per_sec: concurrent_writes: Which version of Cassandra? Regards, Andras From: Joel Samuelsson samuelsson.j...@gmail.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Wednesday 20 March 2013 13:06 To: user@cassandra.apache.org user@cassandra.apache.org Subject: Cassandra freezes Hello, I've been trying to load test a one node cassandra cluster. When I add lots of data, the Cassandra node freezes for 4-5 minutes during which neither reads nor writes are served. During this time, Cassandra takes 100% of a single CPU core. My initial
Re: index_interval memory savings in our case(if you are curious)Š (and performance result)...
It had only been running for 2 hours back then, but it has been a full 24 hours now and our read ping program is still showing the same read times pretty consistently. Dean On 3/21/13 1:51 AM, Andras Szerdahelyi andras.szerdahe...@ignitionone.com wrote: Wow. SO LCS with bloom filter fp chance of 0.1 and an index sampling rate of 512 on a column family of 1.7billion rows each node yields 100% result on first sstable reads? That sounds amazing. And I assume this is cfhistograms output from a node that has been on 512 for a while? ( I still think its unlikely 1.2x re-samples sstables on startup -- I'm on on 1.1x though ) For LCS, same fp chance and sampling rate, with 300-500mil rows per node ( 300-400GB ) on 1.1x my sstable reads for a single read got pretty much out of control. On 20/03/13 14:35, Hiller, Dean dean.hil...@nrel.gov wrote: I am using LCS so bloom filter fp default for 1.2.2 is 0.1 so my bloomfilter size is 1.27G RAM(nodetool cfstats)1.7 billion rows each node. My cfstats for this CF is attached(Since cut and paste screwed up the formatting). During testing in QA, we were not sure if index_interval change was working so we dug into the code to find out, it basically seems to immediately convert on startup though doesn't log anything except at a debug level which we don't have on. Dean On 3/20/13 6:58 AM, Andras Szerdahelyi andras.szerdahe...@ignitionone.com wrote: I am curious, thanks. ( I am in the same situation, big nodes choking under 300-400G data load, 500mil keys ) How does your cfhistograms Keyspace CF output look like? How many sstable reads ? What is your bloom filter fp chance ? Regards, Andras On 20/03/13 13:54, Hiller, Dean dean.hil...@nrel.gov wrote: Oh, and to give you an idea of memory savings, we had a node at 10G RAM usage...we had upped a few nodes to 16G from 8G as we don't have our new nodes ready yet(we know we should be at 8G but we would have a dead cluster if we did that). On startup, the initial RAM is around 6-8G. Startup with index_interval=512 resulted in a 2.5G-2.8G initial RAM and I have seen it grow to 3.3G and back down to 2.8G. We just rolled this out an hour ago. Our website response time is the same as before as well. We rolled to only 2 nodes(out of 6) in our cluster so far to test it out and let it soak a bit. We will slowly roll to more nodes monitoring the performance as we go. Also, since dynamic snitch is not working with SimpleSnitch, we know that just one slow node affects our website(from personal pain/experience of nodes hitting RAM limit and slowing down causing website to get real slow). Dean On 3/20/13 6:41 AM, Andras Szerdahelyi andras.szerdahe...@ignitionone.com wrote: 2. Upping index_interval from 128 to 512 (this seemed to reduce our memory usage significantly!!!) I'd be very careful with that as a one-stop improvement solution for two reasons AFAIK 1) you have to rebuild stables ( not an issue if you are evaluating, doing test writes.. Etc, not so much in production ) 2) it can affect reads ( number of sstable reads to serve a read ) especially if your key/row cache is ineffective On 20/03/13 13:34, Hiller, Dean dean.hil...@nrel.gov wrote: Also, look at the cassandra logs. I bet you see the typicalŠblah blah is at 0.85, doing memory cleanup which is not exactly GC but cassandra memory managementŠ..and of course, you have GC on top of that. If you need to get your memory down, there are multiple ways 1. Switching size tiered compaction to leveled compaction(with 1 billion narrow rows, this helped us quite a bit) 2. Upping index_interval from 128 to 512 (this seemed to reduce our memory usage significantly!!!) 3. Just add more nodes as moving the rows to other servers reduces memory from #1 and #2 above since the server would have less rows Later, Dean On 3/20/13 6:29 AM, Andras Szerdahelyi andras.szerdahe...@ignitionone.com wrote: I'd say GC. Please fill in form CASS-FREEZE-001 below and get back to us :-) ( sorry ) How big is your JVM heap ? How many CPUs ? Garbage collection taking long ? ( look for log lines from GCInspector) Running out of heap ? ( heap is .. full log lines ) Any tasks backing up / being dropped ? ( nodetool tpstats and .. dropped in last .. ms log lines ) Are writes really slow? ( nodetool cfhistograms Keyspace ColumnFamily ) How much is lots of data? Wide or skinny rows? Mutations/sec ? Which Compaction Strategy are you using? Output of show schema ( cassandra-cli ) for the relevant Keyspace/CF might help as well What consistency are you doing your writes with ? I assume ONE or ANY if you have a single node. What are the values for these settings in cassandra.yaml memtable_total_space_in_mb: memtable_flush_writers: memtable_flush_queue_size: compaction_throughput_mb_per_sec: concurrent_writes: Which version of Cassandra? Regards, Andras From: Joel Samuelsson samuelsson.j...@gmail.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Wednesday 20
index_interval memory savings in our case(if you are curious)Š (and performance result)...
Oh, and to give you an idea of memory savings, we had a node at 10G RAM usage...we had upped a few nodes to 16G from 8G as we don't have our new nodes ready yet(we know we should be at 8G but we would have a dead cluster if we did that). On startup, the initial RAM is around 6-8G. Startup with index_interval=512 resulted in a 2.5G-2.8G initial RAM and I have seen it grow to 3.3G and back down to 2.8G. We just rolled this out an hour ago. Our website response time is the same as before as well. We rolled to only 2 nodes(out of 6) in our cluster so far to test it out and let it soak a bit. We will slowly roll to more nodes monitoring the performance as we go. Also, since dynamic snitch is not working with SimpleSnitch, we know that just one slow node affects our website(from personal pain/experience of nodes hitting RAM limit and slowing down causing website to get real slow). Dean On 3/20/13 6:41 AM, Andras Szerdahelyi andras.szerdahe...@ignitionone.com wrote: 2. Upping index_interval from 128 to 512 (this seemed to reduce our memory usage significantly!!!) I'd be very careful with that as a one-stop improvement solution for two reasons AFAIK 1) you have to rebuild stables ( not an issue if you are evaluating, doing test writes.. Etc, not so much in production ) 2) it can affect reads ( number of sstable reads to serve a read ) especially if your key/row cache is ineffective On 20/03/13 13:34, Hiller, Dean dean.hil...@nrel.gov wrote: Also, look at the cassandra logs. I bet you see the typicalŠblah blah is at 0.85, doing memory cleanup which is not exactly GC but cassandra memory managementŠ..and of course, you have GC on top of that. If you need to get your memory down, there are multiple ways 1. Switching size tiered compaction to leveled compaction(with 1 billion narrow rows, this helped us quite a bit) 2. Upping index_interval from 128 to 512 (this seemed to reduce our memory usage significantly!!!) 3. Just add more nodes as moving the rows to other servers reduces memory from #1 and #2 above since the server would have less rows Later, Dean On 3/20/13 6:29 AM, Andras Szerdahelyi andras.szerdahe...@ignitionone.com wrote: I'd say GC. Please fill in form CASS-FREEZE-001 below and get back to us :-) ( sorry ) How big is your JVM heap ? How many CPUs ? Garbage collection taking long ? ( look for log lines from GCInspector) Running out of heap ? ( heap is .. full log lines ) Any tasks backing up / being dropped ? ( nodetool tpstats and .. dropped in last .. ms log lines ) Are writes really slow? ( nodetool cfhistograms Keyspace ColumnFamily ) How much is lots of data? Wide or skinny rows? Mutations/sec ? Which Compaction Strategy are you using? Output of show schema ( cassandra-cli ) for the relevant Keyspace/CF might help as well What consistency are you doing your writes with ? I assume ONE or ANY if you have a single node. What are the values for these settings in cassandra.yaml memtable_total_space_in_mb: memtable_flush_writers: memtable_flush_queue_size: compaction_throughput_mb_per_sec: concurrent_writes: Which version of Cassandra? Regards, Andras From: Joel Samuelsson samuelsson.j...@gmail.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Wednesday 20 March 2013 13:06 To: user@cassandra.apache.org user@cassandra.apache.org Subject: Cassandra freezes Hello, I've been trying to load test a one node cassandra cluster. When I add lots of data, the Cassandra node freezes for 4-5 minutes during which neither reads nor writes are served. During this time, Cassandra takes 100% of a single CPU core. My initial thought was that this was Cassandra flushing memtables to the disk, however, the disk i/o is very low during this time. Any idea what my problem could be? I'm running in a virtual environment in which I have no control of drives. So commit log and data directory is (probably) on the same drive. Best regards, Joel Samuelsson
Re: index_interval memory savings in our case(if you are curious)Š (and performance result)...
I am using LCS so bloom filter fp default for 1.2.2 is 0.1 so my bloomfilter size is 1.27G RAM(nodetool cfstats)1.7 billion rows each node. My cfstats for this CF is attached(Since cut and paste screwed up the formatting). During testing in QA, we were not sure if index_interval change was working so we dug into the code to find out, it basically seems to immediately convert on startup though doesn't log anything except at a debug level which we don't have on. Dean On 3/20/13 6:58 AM, Andras Szerdahelyi andras.szerdahe...@ignitionone.com wrote: I am curious, thanks. ( I am in the same situation, big nodes choking under 300-400G data load, 500mil keys ) How does your cfhistograms Keyspace CF output look like? How many sstable reads ? What is your bloom filter fp chance ? Regards, Andras On 20/03/13 13:54, Hiller, Dean dean.hil...@nrel.gov wrote: Oh, and to give you an idea of memory savings, we had a node at 10G RAM usage...we had upped a few nodes to 16G from 8G as we don't have our new nodes ready yet(we know we should be at 8G but we would have a dead cluster if we did that). On startup, the initial RAM is around 6-8G. Startup with index_interval=512 resulted in a 2.5G-2.8G initial RAM and I have seen it grow to 3.3G and back down to 2.8G. We just rolled this out an hour ago. Our website response time is the same as before as well. We rolled to only 2 nodes(out of 6) in our cluster so far to test it out and let it soak a bit. We will slowly roll to more nodes monitoring the performance as we go. Also, since dynamic snitch is not working with SimpleSnitch, we know that just one slow node affects our website(from personal pain/experience of nodes hitting RAM limit and slowing down causing website to get real slow). Dean On 3/20/13 6:41 AM, Andras Szerdahelyi andras.szerdahe...@ignitionone.com wrote: 2. Upping index_interval from 128 to 512 (this seemed to reduce our memory usage significantly!!!) I'd be very careful with that as a one-stop improvement solution for two reasons AFAIK 1) you have to rebuild stables ( not an issue if you are evaluating, doing test writes.. Etc, not so much in production ) 2) it can affect reads ( number of sstable reads to serve a read ) especially if your key/row cache is ineffective On 20/03/13 13:34, Hiller, Dean dean.hil...@nrel.gov wrote: Also, look at the cassandra logs. I bet you see the typicalŠblah blah is at 0.85, doing memory cleanup which is not exactly GC but cassandra memory managementŠ..and of course, you have GC on top of that. If you need to get your memory down, there are multiple ways 1. Switching size tiered compaction to leveled compaction(with 1 billion narrow rows, this helped us quite a bit) 2. Upping index_interval from 128 to 512 (this seemed to reduce our memory usage significantly!!!) 3. Just add more nodes as moving the rows to other servers reduces memory from #1 and #2 above since the server would have less rows Later, Dean On 3/20/13 6:29 AM, Andras Szerdahelyi andras.szerdahe...@ignitionone.com wrote: I'd say GC. Please fill in form CASS-FREEZE-001 below and get back to us :-) ( sorry ) How big is your JVM heap ? How many CPUs ? Garbage collection taking long ? ( look for log lines from GCInspector) Running out of heap ? ( heap is .. full log lines ) Any tasks backing up / being dropped ? ( nodetool tpstats and .. dropped in last .. ms log lines ) Are writes really slow? ( nodetool cfhistograms Keyspace ColumnFamily ) How much is lots of data? Wide or skinny rows? Mutations/sec ? Which Compaction Strategy are you using? Output of show schema ( cassandra-cli ) for the relevant Keyspace/CF might help as well What consistency are you doing your writes with ? I assume ONE or ANY if you have a single node. What are the values for these settings in cassandra.yaml memtable_total_space_in_mb: memtable_flush_writers: memtable_flush_queue_size: compaction_throughput_mb_per_sec: concurrent_writes: Which version of Cassandra? Regards, Andras From: Joel Samuelsson samuelsson.j...@gmail.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Wednesday 20 March 2013 13:06 To: user@cassandra.apache.org user@cassandra.apache.org Subject: Cassandra freezes Hello, I've been trying to load test a one node cassandra cluster. When I add lots of data, the Cassandra node freezes for 4-5 minutes during which neither reads nor writes are served. During this time, Cassandra takes 100% of a single CPU core. My initial thought was that this was Cassandra flushing memtables to the disk, however, the disk i/o is very low during this time. Any idea what my problem could be? I'm running in a virtual environment in which I have no control of drives. So commit log and data directory is (probably) on the same drive. Best regards, Joel Samuelsson Offset SSTables Write Latency Read Latency Row Size Column Count 1 15137 0 0
configurable index_interval per keyspace
It would be good to have index_interval configurable per keyspace. Preferably in cassandra.yaml because i use it as tuning on nodes running out of memory without affecting performance noticeably.