Re: Newsletter / Marketing: Re: Compaction Strategy
I suspect that you are CPU bound rather than IO bound. There are a lot of areas to look into, but I would start with a few. I could not tell much from the results you shared since at the time, there were no writes happening. Switching to a different compaction strategy will most likely make it worse for you. as of now, you only use 1 sstable per read, and STCS is the least expensive compaction type. For starters, 1) Revise cassandra.yaml for Common disk settings, i.e., concurrent_reads, concurrent_writes, etc https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html 2) Ensure that you optimize your OS for C* https://docs.datastax.com/en/dse/6.0/dse-admin/datastax_enterprise/config/configRecommendedSettings.html What I would do next is to monitor the system. The bottleneck you explained is triggered by clients and it's out of your control. So 3) monitor system resources. If you have DSE, then use OpsCenter. Otherwise, you can use dstat. something like 'dstat -taf' would do it. You will have to run this for a long period of time until the timeouts occur. So, now you can have a general idea of what resources are saturating. 4) If this is CPU bound, then reduce contention by setting concurrent_compactors to 1 in cassandra.yaml 5) monitor GC. There are a lot of tools that you can use to do so. most of the time, it's the GC that is not tuned well. If you are not using G1GC, then you might want to do so you can read about GC here briefly: https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsTuneJVM.html https://docs.datastax.com/en/dse-trblshoot/doc/troubleshooting/gcPauses.html 6) this sounds naive, but check the logs to see if there is something interesting there, you can also see the GC pauses there as well. Ali Hubail Petrolink International Ltd. Confidentiality warning: This message and any attachments are intended only for the persons to whom this message is addressed, are confidential, and may be privileged. If you are not the intended recipient, you are hereby notified that any review, retransmission, conversion to hard copy, copying, modification, circulation or other use of this message and any attachments is strictly prohibited. If you receive this message in error, please notify the sender immediately by return email, and delete this message and any attachments from your system. Petrolink International Limited its subsidiaries, holding companies and affiliates disclaims all responsibility from and accepts no liability whatsoever for the consequences of any unauthorized person acting, or refraining from acting, on any information contained in this message. For security purposes, staff training, to assist in resolving complaints and to improve our customer service, email communications may be monitored and telephone calls may be recorded. rajasekhar kommineni 09/20/2018 01:14 PM Please respond to user@cassandra.apache.org To user@cassandra.apache.org, cc Subject Newsletter / Marketing: Re: Compaction Strategy Hi Ali, Please find my answers 1) The table holds customer history data, where we receive the transaction data everyday for multiple vendors and batch job is executed which updates the data if the customer do any transactions that day, and insert will happen if he is new customer. Reads will happen if the customer visits to calculate the relevancy of items based on the transactions he had done. I attached the tablestats & tablehistograms output to file. 2) RAM : 30GB, CPU:4, hard drive : Amazon EBS 3) Attached output to file Thanks, On Sep 20, 2018, at 10:53 AM, Ali Hubail wrote: Hello Rajasekhar, It's not really clear to me what your workload is. As I understand it, you do heavy writes, but what about reads? So, could you: 1) execute nodetool tablestats nodetool tablehistograms nodetool compactionstats we should be able to see the latency, workload type, and the # of sstable used for reads 2) specify your hardware specs. i.e., memory size, cpu, # of drives (for data sstables), and type of harddrives (ssd/hdd) 3) cassandra.yaml (make sure to sanitize it) You have a lot of updates, and your data is most likely scattered across different sstables. size compaction strategy (STCS) is much less expensive than level compaction strategy (LCS). Stopping the background compaction should be approached with caution, I think your problem is more to do with why STCS compaction is taking more resources than you expect. Regards, Ali Hubail Petrolink International Ltd Confidentiality warning: This message and any attachments are intended only for the persons to whom this message is addressed, are confidential, and may be privileged. If you are not the intended recipient, you are hereby notified that any review, retransmission, conversion to hard copy, copying, modification, circulation or other use of this message and any attachmen
Re: Compaction Strategy
mental_backups: false snapshot_before_compaction: false auto_snapshot: true column_index_size_in_kb: 64 column_index_cache_size_in_kb: 2 compaction_throughput_mb_per_sec: 16 sstable_preemptive_open_interval_in_mb: 50 read_request_timeout_in_ms: 5000 range_request_timeout_in_ms: 1 write_request_timeout_in_ms: 2000 counter_write_request_timeout_in_ms: 5000 cas_contention_timeout_in_ms: 1000 truncate_request_timeout_in_ms: 6 request_timeout_in_ms: 1 slow_query_log_timeout_in_ms: 500 cross_node_timeout: false endpoint_snitch: SimpleSnitch dynamic_snitch_update_interval_in_ms: 100 dynamic_snitch_reset_interval_in_ms: 60 dynamic_snitch_badness_threshold: 0.1 request_scheduler: org.apache.cassandra.scheduler.NoScheduler server_encryption_options: internode_encryption: none keystore: conf/.keystore keystore_password: cassandra truststore: conf/.truststore truststore_password: cassandra client_encryption_options: enabled: false optional: false keystore: conf/.keystore keystore_password: cassandra internode_compression: dc inter_dc_tcp_nodelay: false tracetype_query_ttl: 86400 tracetype_repair_ttl: 604800 enable_user_defined_functions: false enable_scripted_user_defined_functions: false windows_timer_interval: 1 transparent_data_encryption_options: enabled: false chunk_length_kb: 64 cipher: AES/CBC/PKCS5Padding key_alias: testing:1 key_provider: - class_name: org.apache.cassandra.security.JKSKeyProvider parameters: - keystore: conf/.keystore keystore_password: cassandra store_type: JCEKS key_password: cassandra tombstone_warn_threshold: 1000 tombstone_failure_threshold: 10 batch_size_warn_threshold_in_kb: 5 batch_size_fail_threshold_in_kb: 50 unlogged_batch_across_partitions_warn_threshold: 10 compaction_large_partition_warning_threshold_mb: 100 gc_warn_threshold_in_ms: 1000 back_pressure_enabled: false back_pressure_strategy: - class_name: org.apache.cassandra.net.RateBasedBackPressure parameters: - high_ratio: 0.90 factor: 5 flow: FAST prd-relevancy-csdra1:/tmp > On Sep 20, 2018, at 10:53 AM, Ali Hubail <ali.hub...@petrolink.com> wrote:Hello Rajasekhar, It's not really clear to me what your workload is. As I understand it, you do heavy writes, but what about reads? So, could you: 1) execute nodetool tablestats nodetool tablehistograms nodetool compactionstats we should be able to see the latency, workload type, and the # of sstable used for reads 2) specify your hardware specs. i.e., memory size, cpu, # of drives (for data sstables), and type of harddrives (ssd/hdd) 3) cassandra.yaml (make sure to sanitize it) You have a lot of updates, and your data is most likely scattered across different sstables. size compaction strategy (STCS) is much less expensive than level compaction strategy (LCS). Stopping the background compaction should be approached with caution, I think your problem is more to do with why STCS compaction is taking more resources than you expect. Regards, Ali Hubail Petrolink International Ltd Confidentiality warning: This message and any attachments are intended only for the persons to whom this message is addressed, are confidential, and may be privileged. If you are not the intended recipient, you are hereby notified that any review, retransmission, conversion to hard copy, copying, modification, circulation or other use of this message and any attachments is strictly prohibited. If you receive this message in error, please notify the sender immediately by return email, and delete this message and any attachments from your system. Petrolink International Limited its subsidiaries, holding companies and affiliates disclaims all responsibility from and accepts no liability whatsoever for the consequences of any unauthorized person acting, or refraining from acting, on any information contained in this message. For security purposes, staff training, to assist in resolving complaints and to improve our customer service, email communications may be monitored and telephone calls may be recorded. rajasekhar kommineni <rajaco...@gmail.com> 09/19/2018 04:44 PM Please respond to user@cassandra.apache.org To user@cassandra.apache.org, cc Subject Re: Compaction Strategy Hello, Can any one respond to my questions. Is it a good idea to disable auto compaction and schedule it every 3 days. I am unable to control compaction and it is causing timeouts. Also will reducing or increasing compaction_throughput_mb_per_sec eliminate timeouts ? Thanks, > On Sep 17, 2018, at 9:38 PM, rajasekhar kommineni <rajaco...@gmail.com> wrote: > > Hello Folks, > > I need advice in deciding the compaction strategy for my C cluster. There are multiple jobs that will load the data with less inserts and more updates but no deletes. Currently I am using Size Tired compaction, but seei
Re: Compaction Strategy
Hello Rajasekhar, It's not really clear to me what your workload is. As I understand it, you do heavy writes, but what about reads? So, could you: 1) execute nodetool tablestats nodetool tablehistograms nodetool compactionstats we should be able to see the latency, workload type, and the # of sstable used for reads 2) specify your hardware specs. i.e., memory size, cpu, # of drives (for data sstables), and type of harddrives (ssd/hdd) 3) cassandra.yaml (make sure to sanitize it) You have a lot of updates, and your data is most likely scattered across different sstables. size compaction strategy (STCS) is much less expensive than level compaction strategy (LCS). Stopping the background compaction should be approached with caution, I think your problem is more to do with why STCS compaction is taking more resources than you expect. Regards, Ali Hubail Petrolink International Ltd Confidentiality warning: This message and any attachments are intended only for the persons to whom this message is addressed, are confidential, and may be privileged. If you are not the intended recipient, you are hereby notified that any review, retransmission, conversion to hard copy, copying, modification, circulation or other use of this message and any attachments is strictly prohibited. If you receive this message in error, please notify the sender immediately by return email, and delete this message and any attachments from your system. Petrolink International Limited its subsidiaries, holding companies and affiliates disclaims all responsibility from and accepts no liability whatsoever for the consequences of any unauthorized person acting, or refraining from acting, on any information contained in this message. For security purposes, staff training, to assist in resolving complaints and to improve our customer service, email communications may be monitored and telephone calls may be recorded. rajasekhar kommineni 09/19/2018 04:44 PM Please respond to user@cassandra.apache.org To user@cassandra.apache.org, cc Subject Re: Compaction Strategy Hello, Can any one respond to my questions. Is it a good idea to disable auto compaction and schedule it every 3 days. I am unable to control compaction and it is causing timeouts. Also will reducing or increasing compaction_throughput_mb_per_sec eliminate timeouts ? Thanks, > On Sep 17, 2018, at 9:38 PM, rajasekhar kommineni wrote: > > Hello Folks, > > I need advice in deciding the compaction strategy for my C cluster. There are multiple jobs that will load the data with less inserts and more updates but no deletes. Currently I am using Size Tired compaction, but seeing auto compactions after the data load kicks, and also read timeouts during compaction. > > Can anyone suggest good compaction strategy for my cluster which will reduce the timeouts. > > > Thanks, > - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Compaction Strategy
It’s not recommended to disable compaction, you will end up with hundreds to thousands of sstables and increased read latency. If your data is immitable, means no update/deletes it will have least impact. Decreasing compaction throughput will release resources for application but don’t accumulate too many pending compaction tasks. Sent from my iPhone > On Sep 19, 2018, at 4:44 PM, rajasekhar kommineni wrote: > > Hello, > > Can any one respond to my questions. Is it a good idea to disable auto > compaction and schedule it every 3 days. I am unable to control compaction > and it is causing timeouts. > > Also will reducing or increasing compaction_throughput_mb_per_sec eliminate > timeouts ? > > Thanks, > > >> On Sep 17, 2018, at 9:38 PM, rajasekhar kommineni >> wrote: >> >> Hello Folks, >> >> I need advice in deciding the compaction strategy for my C cluster. There >> are multiple jobs that will load the data with less inserts and more updates >> but no deletes. Currently I am using Size Tired compaction, but seeing auto >> compactions after the data load kicks, and also read timeouts during >> compaction. >> >> Can anyone suggest good compaction strategy for my cluster which will reduce >> the timeouts. >> >> >> Thanks, >> > > > - > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Compaction Strategy
Hello, Can any one respond to my questions. Is it a good idea to disable auto compaction and schedule it every 3 days. I am unable to control compaction and it is causing timeouts. Also will reducing or increasing compaction_throughput_mb_per_sec eliminate timeouts ? Thanks, > On Sep 17, 2018, at 9:38 PM, rajasekhar kommineni wrote: > > Hello Folks, > > I need advice in deciding the compaction strategy for my C cluster. There are > multiple jobs that will load the data with less inserts and more updates but > no deletes. Currently I am using Size Tired compaction, but seeing auto > compactions after the data load kicks, and also read timeouts during > compaction. > > Can anyone suggest good compaction strategy for my cluster which will reduce > the timeouts. > > > Thanks, > - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Compaction strategy for update heavy workload
> > I wouldn't use TWCS if there's updates, you're going to risk having > data that's never deleted and really small sstables sticking around > forever. How do you risk having data sticking around forever when everything is TTL'd? If you use really large buckets, what's the point of TWCS? No one said anything about really large buckets. I'd also note that if the data was so small per partition it would be entirely reasonable to not bucket by partition key (and window) and thus updates would become irrelevant. Honestly this is such a small workload you could easily use STCS or > LCS and you'd likely never, ever see a problem. While the numbers sound small, there must be some logical reason to have so many nodes. In my experience STCS and LCS both have their own drawbacks in regards to updates, more so when you have high data density, which sounds like it might be the case here. It's not hard to test these things and it's important to get these things right at the start to save yourself some serious pain down the track. On 13 June 2018 at 22:41, Jonathan Haddad wrote: > I wouldn't use TWCS if there's updates, you're going to risk having > data that's never deleted and really small sstables sticking around > forever. If you use really large buckets, what's the point of TWCS? > > Honestly this is such a small workload you could easily use STCS or > LCS and you'd likely never, ever see a problem. > On Wed, Jun 13, 2018 at 3:34 PM kurt greaves wrote: > > > > TWCS is probably still worth trying. If you mean updating old rows in > TWCS "out of order updates" will only really mean you'll hit more SSTables > on read. This might add a bit of complexity in your client if your > bucketing partitions (not strictly necessary), but that's about it. As long > as you're not specifying "USING TIMESTAMP" you still get the main benefit > of efficient dropping of SSTables - C* only cares about the write timestamp > of the data in regards to TTL's, not timestamps stored in your > partition/clustering key. > > Also keep in mind that you can specify the window size in TWCS, so if > you can increase it enough to cover the "out of order" updates then that > will also solve the problem w.r.t old buckets. > > > > In regards to LCS, the only way to really know if it'll be too much > compaction overhead is to test it, but for the most part you should > consider your read/write ratio, rather than the total number of > reads/writes (unless it's so small that it's irrelevant, which it may well > be). > > > > On 13 June 2018 at 19:25, manuj singh wrote: > >> > >> Hi all, > >> I am trying to determine compaction strategy for our use case. > >> In our use case we will have updates on a row a few times. And we have > a ttl also defined on the table level. > >> Our typical workload is less then 1000 writes + reads per second. At > the max it could go up to 2500 per second. > >> We use SSD and have around 64 gb of ram on each node. Our cluster size > is around 70 nodes. > >> > >> I looked at time series but we cant guarantee that the updates will > happen within a give time window. And if we have out of order updates it > might impact on when we remove that data from the disk. > >> > >> So i was looking at level tiered, which supposedly is good when you > have updates. However its io bound and will affect the writes. everywhere i > read it says its not good for write heavy workload. > >> But Looking at our write velocity, is it really write heavy ? > >> > >> I guess what i am trying to find out is will level tiered compaction > will impact the writes in our use case or it will be fine given our write > rate is not that much. > >> Also is there anything else i should keep in mind while deciding on the > compaction strategy. > >> > >> Thanks!! > > > > > > > -- > Jon Haddad > http://www.rustyrazorblade.com > twitter: rustyrazorblade > > - > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > >
Re: Compaction strategy for update heavy workload
I wouldn't use TWCS if there's updates, you're going to risk having data that's never deleted and really small sstables sticking around forever. If you use really large buckets, what's the point of TWCS? Honestly this is such a small workload you could easily use STCS or LCS and you'd likely never, ever see a problem. On Wed, Jun 13, 2018 at 3:34 PM kurt greaves wrote: > > TWCS is probably still worth trying. If you mean updating old rows in TWCS > "out of order updates" will only really mean you'll hit more SSTables on > read. This might add a bit of complexity in your client if your bucketing > partitions (not strictly necessary), but that's about it. As long as you're > not specifying "USING TIMESTAMP" you still get the main benefit of efficient > dropping of SSTables - C* only cares about the write timestamp of the data in > regards to TTL's, not timestamps stored in your partition/clustering key. > Also keep in mind that you can specify the window size in TWCS, so if you can > increase it enough to cover the "out of order" updates then that will also > solve the problem w.r.t old buckets. > > In regards to LCS, the only way to really know if it'll be too much > compaction overhead is to test it, but for the most part you should consider > your read/write ratio, rather than the total number of reads/writes (unless > it's so small that it's irrelevant, which it may well be). > > On 13 June 2018 at 19:25, manuj singh wrote: >> >> Hi all, >> I am trying to determine compaction strategy for our use case. >> In our use case we will have updates on a row a few times. And we have a ttl >> also defined on the table level. >> Our typical workload is less then 1000 writes + reads per second. At the max >> it could go up to 2500 per second. >> We use SSD and have around 64 gb of ram on each node. Our cluster size is >> around 70 nodes. >> >> I looked at time series but we cant guarantee that the updates will happen >> within a give time window. And if we have out of order updates it might >> impact on when we remove that data from the disk. >> >> So i was looking at level tiered, which supposedly is good when you have >> updates. However its io bound and will affect the writes. everywhere i read >> it says its not good for write heavy workload. >> But Looking at our write velocity, is it really write heavy ? >> >> I guess what i am trying to find out is will level tiered compaction will >> impact the writes in our use case or it will be fine given our write rate is >> not that much. >> Also is there anything else i should keep in mind while deciding on the >> compaction strategy. >> >> Thanks!! > > -- Jon Haddad http://www.rustyrazorblade.com twitter: rustyrazorblade - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Compaction strategy for update heavy workload
TWCS is probably still worth trying. If you mean updating old rows in TWCS "out of order updates" will only really mean you'll hit more SSTables on read. This might add a bit of complexity in your client if your bucketing partitions (not strictly necessary), but that's about it. As long as you're not specifying "USING TIMESTAMP" you still get the main benefit of efficient dropping of SSTables - C* only cares about the *write timestamp* of the data in regards to TTL's, not timestamps stored in your partition/clustering key. Also keep in mind that you can specify the window size in TWCS, so if you can increase it enough to cover the "out of order" updates then that will also solve the problem w.r.t old buckets. In regards to LCS, the only way to really know if it'll be too much compaction overhead is to test it, but for the most part you should consider your read/write ratio, rather than the total number of reads/writes (unless it's so small that it's irrelevant, which it may well be). On 13 June 2018 at 19:25, manuj singh wrote: > Hi all, > I am trying to determine compaction strategy for our use case. > In our use case we will have updates on a row a few times. And we have a > ttl also defined on the table level. > Our typical workload is less then 1000 writes + reads per second. At the > max it could go up to 2500 per second. > We use SSD and have around 64 gb of ram on each node. Our cluster size is > around 70 nodes. > > I looked at time series but we cant guarantee that the updates will happen > within a give time window. And if we have out of order updates it might > impact on when we remove that data from the disk. > > So i was looking at level tiered, which supposedly is good when you have > updates. However its io bound and will affect the writes. everywhere i read > it says its not good for write heavy workload. > But Looking at our write velocity, is it really write heavy ? > > I guess what i am trying to find out is will level tiered compaction will > impact the writes in our use case or it will be fine given our write rate > is not that much. > Also is there anything else i should keep in mind while deciding on the > compaction strategy. > > Thanks!! >
Re: Compaction Strategy guidance
Ah, clear then. SSD usage imposes a different bias in terms of costs;-) On Tue, Nov 25, 2014 at 9:48 PM, Nikolai Grigoriev wrote: > Andrei, > > Oh, yes, I have scanned the top of your previous email but overlooked the > last part. > > I am using SSDs so I prefer to put extra work to keep my system performing > and save expensive disk space. So far I've been able to size the system more > or less correctly so these LCS limitations do not cause too much troubles. > But I do keep the CF "sharding" option as backup - for me it will be > relatively easy to implement it. > > > On Tue, Nov 25, 2014 at 1:25 PM, Andrei Ivanov wrote: >> >> Nikolai, >> >> Just in case you've missed my comment in the thread (guess you have) - >> increasing sstable size does nothing (in our case at least). That is, >> it's not worse but the load pattern is still the same - doing nothing >> most of the time. So, I switched to STCS and we will have to live with >> extra storage cost - storage is way cheaper than cpu etc anyhow:-) >> >> On Tue, Nov 25, 2014 at 5:53 PM, Nikolai Grigoriev >> wrote: >> > Hi Jean-Armel, >> > >> > I am using latest and greatest DSE 4.5.2 (4.5.3 in another cluster but >> > there >> > are no relevant changes between 4.5.2 and 4.5.3) - thus, Cassandra >> > 2.0.10. >> > >> > I have about 1,8Tb of data per node now in total, which falls into that >> > range. >> > >> > As I said, it is really a problem with large amount of data in a single >> > CF, >> > not total amount of data. Quite often the nodes are idle yet having >> > quite a >> > bit of pending compactions. I have discussed it with other members of C* >> > community and DataStax guys and, they have confirmed my observation. >> > >> > I believe that increasing the sstable size won't help at all and >> > probably >> > will make the things worse - everything else being equal, of course. But >> > I >> > would like to hear from Andrei when he is done with his test. >> > >> > Regarding the last statement - yes, C* clearly likes many small servers >> > more >> > than fewer large ones. But it is all relative - and can be all >> > recalculated >> > to $$$ :) C* is all about partitioning of everything - storage, >> > traffic...Less data per node and more nodes give you lower latency, >> > lower >> > heap usage etc, etc. I think I have learned this with my project. >> > Somewhat >> > hard way but still, nothing is better than the personal experience :) >> > >> > On Tue, Nov 25, 2014 at 3:23 AM, Jean-Armel Luce >> > wrote: >> >> >> >> Hi Andrei, Hi Nicolai, >> >> >> >> Which version of C* are you using ? >> >> >> >> There are some recommendations about the max storage per node : >> >> >> >> http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2 >> >> >> >> "For 1.0 we recommend 300-500GB. For 1.2 we are looking to be able to >> >> handle 10x >> >> (3-5TB)". >> >> >> >> I have the feeling that those recommendations are sensitive according >> >> many >> >> criteria such as : >> >> - your hardware >> >> - the compaction strategy >> >> - ... >> >> >> >> It looks that LCS lower those limitations. >> >> >> >> Increasing the size of sstables might help if you have enough CPU and >> >> you >> >> can put more load on your I/O system (@Andrei, I am interested by the >> >> results of your experimentation about large sstable files) >> >> >> >> From my point of view, there are some usage patterns where it is better >> >> to >> >> have many small servers than a few large servers. Probably, it is >> >> better to >> >> have many small servers if you need LCS for large tables. >> >> >> >> Just my 2 cents. >> >> >> >> Jean-Armel >> >> >> >> 2014-11-24 19:56 GMT+01:00 Robert Coli : >> >>> >> >>> On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev >> >>> >> >>> wrote: >> >> One of the obvious recommendations I have received was to run more >> than >> one instance of C* per host. Makes sense - it will reduce the amount >> of data >> per node and will make better use of the resources. >> >>> >> >>> >> >>> This is usually a Bad Idea to do in production. >> >>> >> >>> =Rob >> >>> >> >> >> >> >> > >> > >> > >> > -- >> > Nikolai Grigoriev >> > (514) 772-5178 > > > > > -- > Nikolai Grigoriev > (514) 772-5178
Re: Compaction Strategy guidance
Andrei, Oh, yes, I have scanned the top of your previous email but overlooked the last part. I am using SSDs so I prefer to put extra work to keep my system performing and save expensive disk space. So far I've been able to size the system more or less correctly so these LCS limitations do not cause too much troubles. But I do keep the CF "sharding" option as backup - for me it will be relatively easy to implement it. On Tue, Nov 25, 2014 at 1:25 PM, Andrei Ivanov wrote: > Nikolai, > > Just in case you've missed my comment in the thread (guess you have) - > increasing sstable size does nothing (in our case at least). That is, > it's not worse but the load pattern is still the same - doing nothing > most of the time. So, I switched to STCS and we will have to live with > extra storage cost - storage is way cheaper than cpu etc anyhow:-) > > On Tue, Nov 25, 2014 at 5:53 PM, Nikolai Grigoriev > wrote: > > Hi Jean-Armel, > > > > I am using latest and greatest DSE 4.5.2 (4.5.3 in another cluster but > there > > are no relevant changes between 4.5.2 and 4.5.3) - thus, Cassandra > 2.0.10. > > > > I have about 1,8Tb of data per node now in total, which falls into that > > range. > > > > As I said, it is really a problem with large amount of data in a single > CF, > > not total amount of data. Quite often the nodes are idle yet having > quite a > > bit of pending compactions. I have discussed it with other members of C* > > community and DataStax guys and, they have confirmed my observation. > > > > I believe that increasing the sstable size won't help at all and probably > > will make the things worse - everything else being equal, of course. But > I > > would like to hear from Andrei when he is done with his test. > > > > Regarding the last statement - yes, C* clearly likes many small servers > more > > than fewer large ones. But it is all relative - and can be all > recalculated > > to $$$ :) C* is all about partitioning of everything - storage, > > traffic...Less data per node and more nodes give you lower latency, lower > > heap usage etc, etc. I think I have learned this with my project. > Somewhat > > hard way but still, nothing is better than the personal experience :) > > > > On Tue, Nov 25, 2014 at 3:23 AM, Jean-Armel Luce > wrote: > >> > >> Hi Andrei, Hi Nicolai, > >> > >> Which version of C* are you using ? > >> > >> There are some recommendations about the max storage per node : > >> > http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2 > >> > >> "For 1.0 we recommend 300-500GB. For 1.2 we are looking to be able to > >> handle 10x > >> (3-5TB)". > >> > >> I have the feeling that those recommendations are sensitive according > many > >> criteria such as : > >> - your hardware > >> - the compaction strategy > >> - ... > >> > >> It looks that LCS lower those limitations. > >> > >> Increasing the size of sstables might help if you have enough CPU and > you > >> can put more load on your I/O system (@Andrei, I am interested by the > >> results of your experimentation about large sstable files) > >> > >> From my point of view, there are some usage patterns where it is better > to > >> have many small servers than a few large servers. Probably, it is > better to > >> have many small servers if you need LCS for large tables. > >> > >> Just my 2 cents. > >> > >> Jean-Armel > >> > >> 2014-11-24 19:56 GMT+01:00 Robert Coli : > >>> > >>> On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev < > ngrigor...@gmail.com> > >>> wrote: > > One of the obvious recommendations I have received was to run more > than > one instance of C* per host. Makes sense - it will reduce the amount > of data > per node and will make better use of the resources. > >>> > >>> > >>> This is usually a Bad Idea to do in production. > >>> > >>> =Rob > >>> > >> > >> > > > > > > > > -- > > Nikolai Grigoriev > > (514) 772-5178 > -- Nikolai Grigoriev (514) 772-5178
Re: Compaction Strategy guidance
Nikolai, Just in case you've missed my comment in the thread (guess you have) - increasing sstable size does nothing (in our case at least). That is, it's not worse but the load pattern is still the same - doing nothing most of the time. So, I switched to STCS and we will have to live with extra storage cost - storage is way cheaper than cpu etc anyhow:-) On Tue, Nov 25, 2014 at 5:53 PM, Nikolai Grigoriev wrote: > Hi Jean-Armel, > > I am using latest and greatest DSE 4.5.2 (4.5.3 in another cluster but there > are no relevant changes between 4.5.2 and 4.5.3) - thus, Cassandra 2.0.10. > > I have about 1,8Tb of data per node now in total, which falls into that > range. > > As I said, it is really a problem with large amount of data in a single CF, > not total amount of data. Quite often the nodes are idle yet having quite a > bit of pending compactions. I have discussed it with other members of C* > community and DataStax guys and, they have confirmed my observation. > > I believe that increasing the sstable size won't help at all and probably > will make the things worse - everything else being equal, of course. But I > would like to hear from Andrei when he is done with his test. > > Regarding the last statement - yes, C* clearly likes many small servers more > than fewer large ones. But it is all relative - and can be all recalculated > to $$$ :) C* is all about partitioning of everything - storage, > traffic...Less data per node and more nodes give you lower latency, lower > heap usage etc, etc. I think I have learned this with my project. Somewhat > hard way but still, nothing is better than the personal experience :) > > On Tue, Nov 25, 2014 at 3:23 AM, Jean-Armel Luce wrote: >> >> Hi Andrei, Hi Nicolai, >> >> Which version of C* are you using ? >> >> There are some recommendations about the max storage per node : >> http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2 >> >> "For 1.0 we recommend 300-500GB. For 1.2 we are looking to be able to >> handle 10x >> (3-5TB)". >> >> I have the feeling that those recommendations are sensitive according many >> criteria such as : >> - your hardware >> - the compaction strategy >> - ... >> >> It looks that LCS lower those limitations. >> >> Increasing the size of sstables might help if you have enough CPU and you >> can put more load on your I/O system (@Andrei, I am interested by the >> results of your experimentation about large sstable files) >> >> From my point of view, there are some usage patterns where it is better to >> have many small servers than a few large servers. Probably, it is better to >> have many small servers if you need LCS for large tables. >> >> Just my 2 cents. >> >> Jean-Armel >> >> 2014-11-24 19:56 GMT+01:00 Robert Coli : >>> >>> On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev >>> wrote: One of the obvious recommendations I have received was to run more than one instance of C* per host. Makes sense - it will reduce the amount of data per node and will make better use of the resources. >>> >>> >>> This is usually a Bad Idea to do in production. >>> >>> =Rob >>> >> >> > > > > -- > Nikolai Grigoriev > (514) 772-5178
Re: Compaction Strategy guidance
Hi Jean-Armel, I am using latest and greatest DSE 4.5.2 (4.5.3 in another cluster but there are no relevant changes between 4.5.2 and 4.5.3) - thus, Cassandra 2.0.10. I have about 1,8Tb of data per node now in total, which falls into that range. As I said, it is really a problem with large amount of data in a single CF, not total amount of data. Quite often the nodes are idle yet having quite a bit of pending compactions. I have discussed it with other members of C* community and DataStax guys and, they have confirmed my observation. I believe that increasing the sstable size won't help at all and probably will make the things worse - everything else being equal, of course. But I would like to hear from Andrei when he is done with his test. Regarding the last statement - yes, C* clearly likes many small servers more than fewer large ones. But it is all relative - and can be all recalculated to $$$ :) C* is all about partitioning of everything - storage, traffic...Less data per node and more nodes give you lower latency, lower heap usage etc, etc. I think I have learned this with my project. Somewhat hard way but still, nothing is better than the personal experience :) On Tue, Nov 25, 2014 at 3:23 AM, Jean-Armel Luce wrote: > Hi Andrei, Hi Nicolai, > > Which version of C* are you using ? > > There are some recommendations about the max storage per node : > http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2 > > "For 1.0 we recommend 300-500GB. For 1.2 we are looking to be able to > handle 10x > (3-5TB)". > > I have the feeling that those recommendations are sensitive according many > criteria such as : > - your hardware > - the compaction strategy > - ... > > It looks that LCS lower those limitations. > > Increasing the size of sstables might help if you have enough CPU and you > can put more load on your I/O system (@Andrei, I am interested by the > results of your experimentation about large sstable files) > > From my point of view, there are some usage patterns where it is better to > have many small servers than a few large servers. Probably, it is better to > have many small servers if you need LCS for large tables. > > Just my 2 cents. > > Jean-Armel > > 2014-11-24 19:56 GMT+01:00 Robert Coli : > >> On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev >> wrote: >> >>> One of the obvious recommendations I have received was to run more than >>> one instance of C* per host. Makes sense - it will reduce the amount of >>> data per node and will make better use of the resources. >>> >> >> This is usually a Bad Idea to do in production. >> >> =Rob >> >> > > -- Nikolai Grigoriev (514) 772-5178
Re: Compaction Strategy guidance
Yep, Marcus, I know. It's mainly a question of cost of those extra x2 disks, you know. Our "final" setup will be more like 30TB, so doubling it is still some cost. But i guess, we will have to live with it On Tue, Nov 25, 2014 at 1:26 PM, Marcus Eriksson wrote: > If you are that write-heavy you should definitely go with STCS, LCS > optimizes for reads by doing more compactions > > /Marcus > > On Tue, Nov 25, 2014 at 11:22 AM, Andrei Ivanov wrote: >> >> Hi Jean-Armel, Nikolai, >> >> 1. Increasing sstable size doesn't work (well, I think, unless we >> "overscale" - add more nodes than really necessary, which is >> prohibitive for us in a way). Essentially there is no change. I gave >> up and will go for STCS;-( >> 2. We use 2.0.11 as of now >> 3. We are running on EC2 c3.8xlarge instances with EBS volumes for data >> (GP SSD) >> >> Jean-Armel, I believe that what you say about many small instances is >> absolutely true. But, is not good in our case - we write a lot and >> almost never read what we've written. That is, we want to be able to >> read everything, but in reality we hardly read 1%, I think. This >> implies that smaller instances are of no use in terms of read >> performance for us. And generally nstances/cpu/ram is more expensive >> than storage. So, we really would like to have instances with large >> storage. >> >> Andrei. >> >> >> >> >> >> On Tue, Nov 25, 2014 at 11:23 AM, Jean-Armel Luce >> wrote: >> > Hi Andrei, Hi Nicolai, >> > >> > Which version of C* are you using ? >> > >> > There are some recommendations about the max storage per node : >> > >> > http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2 >> > >> > "For 1.0 we recommend 300-500GB. For 1.2 we are looking to be able to >> > handle >> > 10x >> > (3-5TB)". >> > >> > I have the feeling that those recommendations are sensitive according >> > many >> > criteria such as : >> > - your hardware >> > - the compaction strategy >> > - ... >> > >> > It looks that LCS lower those limitations. >> > >> > Increasing the size of sstables might help if you have enough CPU and >> > you >> > can put more load on your I/O system (@Andrei, I am interested by the >> > results of your experimentation about large sstable files) >> > >> > From my point of view, there are some usage patterns where it is better >> > to >> > have many small servers than a few large servers. Probably, it is better >> > to >> > have many small servers if you need LCS for large tables. >> > >> > Just my 2 cents. >> > >> > Jean-Armel >> > >> > 2014-11-24 19:56 GMT+01:00 Robert Coli : >> >> >> >> On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev >> >> >> >> wrote: >> >>> >> >>> One of the obvious recommendations I have received was to run more >> >>> than >> >>> one instance of C* per host. Makes sense - it will reduce the amount >> >>> of data >> >>> per node and will make better use of the resources. >> >> >> >> >> >> This is usually a Bad Idea to do in production. >> >> >> >> =Rob >> >> >> > >> > > >
Re: Compaction Strategy guidance
If you are that write-heavy you should definitely go with STCS, LCS optimizes for reads by doing more compactions /Marcus On Tue, Nov 25, 2014 at 11:22 AM, Andrei Ivanov wrote: > Hi Jean-Armel, Nikolai, > > 1. Increasing sstable size doesn't work (well, I think, unless we > "overscale" - add more nodes than really necessary, which is > prohibitive for us in a way). Essentially there is no change. I gave > up and will go for STCS;-( > 2. We use 2.0.11 as of now > 3. We are running on EC2 c3.8xlarge instances with EBS volumes for data > (GP SSD) > > Jean-Armel, I believe that what you say about many small instances is > absolutely true. But, is not good in our case - we write a lot and > almost never read what we've written. That is, we want to be able to > read everything, but in reality we hardly read 1%, I think. This > implies that smaller instances are of no use in terms of read > performance for us. And generally nstances/cpu/ram is more expensive > than storage. So, we really would like to have instances with large > storage. > > Andrei. > > > > > > On Tue, Nov 25, 2014 at 11:23 AM, Jean-Armel Luce > wrote: > > Hi Andrei, Hi Nicolai, > > > > Which version of C* are you using ? > > > > There are some recommendations about the max storage per node : > > > http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2 > > > > "For 1.0 we recommend 300-500GB. For 1.2 we are looking to be able to > handle > > 10x > > (3-5TB)". > > > > I have the feeling that those recommendations are sensitive according > many > > criteria such as : > > - your hardware > > - the compaction strategy > > - ... > > > > It looks that LCS lower those limitations. > > > > Increasing the size of sstables might help if you have enough CPU and you > > can put more load on your I/O system (@Andrei, I am interested by the > > results of your experimentation about large sstable files) > > > > From my point of view, there are some usage patterns where it is better > to > > have many small servers than a few large servers. Probably, it is better > to > > have many small servers if you need LCS for large tables. > > > > Just my 2 cents. > > > > Jean-Armel > > > > 2014-11-24 19:56 GMT+01:00 Robert Coli : > >> > >> On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev < > ngrigor...@gmail.com> > >> wrote: > >>> > >>> One of the obvious recommendations I have received was to run more than > >>> one instance of C* per host. Makes sense - it will reduce the amount > of data > >>> per node and will make better use of the resources. > >> > >> > >> This is usually a Bad Idea to do in production. > >> > >> =Rob > >> > > > > >
Re: Compaction Strategy guidance
Hi Jean-Armel, Nikolai, 1. Increasing sstable size doesn't work (well, I think, unless we "overscale" - add more nodes than really necessary, which is prohibitive for us in a way). Essentially there is no change. I gave up and will go for STCS;-( 2. We use 2.0.11 as of now 3. We are running on EC2 c3.8xlarge instances with EBS volumes for data (GP SSD) Jean-Armel, I believe that what you say about many small instances is absolutely true. But, is not good in our case - we write a lot and almost never read what we've written. That is, we want to be able to read everything, but in reality we hardly read 1%, I think. This implies that smaller instances are of no use in terms of read performance for us. And generally nstances/cpu/ram is more expensive than storage. So, we really would like to have instances with large storage. Andrei. On Tue, Nov 25, 2014 at 11:23 AM, Jean-Armel Luce wrote: > Hi Andrei, Hi Nicolai, > > Which version of C* are you using ? > > There are some recommendations about the max storage per node : > http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2 > > "For 1.0 we recommend 300-500GB. For 1.2 we are looking to be able to handle > 10x > (3-5TB)". > > I have the feeling that those recommendations are sensitive according many > criteria such as : > - your hardware > - the compaction strategy > - ... > > It looks that LCS lower those limitations. > > Increasing the size of sstables might help if you have enough CPU and you > can put more load on your I/O system (@Andrei, I am interested by the > results of your experimentation about large sstable files) > > From my point of view, there are some usage patterns where it is better to > have many small servers than a few large servers. Probably, it is better to > have many small servers if you need LCS for large tables. > > Just my 2 cents. > > Jean-Armel > > 2014-11-24 19:56 GMT+01:00 Robert Coli : >> >> On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev >> wrote: >>> >>> One of the obvious recommendations I have received was to run more than >>> one instance of C* per host. Makes sense - it will reduce the amount of data >>> per node and will make better use of the resources. >> >> >> This is usually a Bad Idea to do in production. >> >> =Rob >> > >
Re: Compaction Strategy guidance
Hi Andrei, Hi Nicolai, Which version of C* are you using ? There are some recommendations about the max storage per node : http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2 "For 1.0 we recommend 300-500GB. For 1.2 we are looking to be able to handle 10x (3-5TB)". I have the feeling that those recommendations are sensitive according many criteria such as : - your hardware - the compaction strategy - ... It looks that LCS lower those limitations. Increasing the size of sstables might help if you have enough CPU and you can put more load on your I/O system (@Andrei, I am interested by the results of your experimentation about large sstable files) >From my point of view, there are some usage patterns where it is better to have many small servers than a few large servers. Probably, it is better to have many small servers if you need LCS for large tables. Just my 2 cents. Jean-Armel 2014-11-24 19:56 GMT+01:00 Robert Coli : > On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev > wrote: > >> One of the obvious recommendations I have received was to run more than >> one instance of C* per host. Makes sense - it will reduce the amount of >> data per node and will make better use of the resources. >> > > This is usually a Bad Idea to do in production. > > =Rob > >
Re: Compaction Strategy guidance
On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev wrote: > One of the obvious recommendations I have received was to run more than > one instance of C* per host. Makes sense - it will reduce the amount of > data per node and will make better use of the resources. > This is usually a Bad Idea to do in production. =Rob
Re: Compaction Strategy guidance
: 3379391 >> >> > Compacted partition mean bytes: 172660 >> >> > Average live cells per slice (last five minutes): 495.0 >> >> > Average tombstones per slice (last five minutes): 0.0 >> >> > >> >> > Another table of similar structure (same number of rows) is about 4x >> >> > times >> >> > smaller. That table does not suffer from those issues - it compacts >> >> > well >> >> > and >> >> > efficiently. >> >> > >> >> > On Mon, Nov 24, 2014 at 2:30 AM, Jean-Armel Luce >> >> > wrote: >> >> >> >> >> >> Hi Nikolai, >> >> >> >> >> >> Please could you clarify a little bit what you call "a large amount >> >> >> of >> >> >> data" ? >> >> >> >> >> >> How many tables ? >> >> >> How many rows in your largest table ? >> >> >> How many GB in your largest table ? >> >> >> How many GB per node ? >> >> >> >> >> >> Thanks. >> >> >> >> >> >> >> >> >> >> >> >> 2014-11-24 8:27 GMT+01:00 Jean-Armel Luce : >> >> >>> >> >> >>> Hi Nikolai, >> >> >>> >> >> >>> Thanks for those informations. >> >> >>> >> >> >>> Please could you clarify a little bit what you call " >> >> >>> >> >> >>> 2014-11-24 4:37 GMT+01:00 Nikolai Grigoriev : >> >> >>>> >> >> >>>> Just to clarify - when I was talking about the large amount of >> >> >>>> data I >> >> >>>> really meant large amount of data per node in a single CF (table). >> >> >>>> LCS does >> >> >>>> not seem to like it when it gets thousands of sstables (makes 4-5 >> >> >>>> levels). >> >> >>>> >> >> >>>> When bootstraping a new node you'd better enable that option from >> >> >>>> CASSANDRA-6621 (the one that disables STCS in L0). But it will >> >> >>>> still >> >> >>>> be a >> >> >>>> mess - I have a node that I have bootstrapped ~2 weeks ago. >> >> >>>> Initially >> >> >>>> it had >> >> >>>> 7,5K pending compactions, now it has almost stabilized ad 4,6K. >> >> >>>> Does >> >> >>>> not go >> >> >>>> down. Number of sstables at L0 is over 11K and it is slowly >> >> >>>> slowly >> >> >>>> building >> >> >>>> upper levels. Total number of sstables is 4x the normal amount. >> >> >>>> Now I >> >> >>>> am not >> >> >>>> entirely sure if this node will ever get back to normal life. And >> >> >>>> believe me >> >> >>>> - this is not because of I/O, I have SSDs everywhere and 16 >> >> >>>> physical >> >> >>>> cores. >> >> >>>> This machine is barely using 1-3 cores at most of the time. The >> >> >>>> problem is >> >> >>>> that allowing STCS fallback is not a good option either - it will >> >> >>>> quickly >> >> >>>> result in a few 200Gb+ sstables in my configuration and then these >> >> >>>> sstables >> >> >>>> will never be compacted. Plus, it will require close to 2x disk >> >> >>>> space >> >> >>>> on >> >> >>>> EVERY disk in my JBOD configuration...this will kill the node >> >> >>>> sooner >> >> >>>> or >> >> >>>> later. This is all because all sstables after bootstrap end at L0 >> >> >>>> and >> >> >>>> then >> >> >>>> the process slowly slowly moves them to other levels. If you have >> >> >>>> write >> >> >>>> traffic to that CF then the number of sstables and L0 will grow >> >> >>>> quickly - >> >> >>>> like it happens in my case now
Re: Compaction Strategy guidance
2:30 AM, Jean-Armel Luce > >> > wrote: > >> >> > >> >> Hi Nikolai, > >> >> > >> >> Please could you clarify a little bit what you call "a large amount > of > >> >> data" ? > >> >> > >> >> How many tables ? > >> >> How many rows in your largest table ? > >> >> How many GB in your largest table ? > >> >> How many GB per node ? > >> >> > >> >> Thanks. > >> >> > >> >> > >> >> > >> >> 2014-11-24 8:27 GMT+01:00 Jean-Armel Luce : > >> >>> > >> >>> Hi Nikolai, > >> >>> > >> >>> Thanks for those informations. > >> >>> > >> >>> Please could you clarify a little bit what you call " > >> >>> > >> >>> 2014-11-24 4:37 GMT+01:00 Nikolai Grigoriev : > >> >>>> > >> >>>> Just to clarify - when I was talking about the large amount of > data I > >> >>>> really meant large amount of data per node in a single CF (table). > >> >>>> LCS does > >> >>>> not seem to like it when it gets thousands of sstables (makes 4-5 > >> >>>> levels). > >> >>>> > >> >>>> When bootstraping a new node you'd better enable that option from > >> >>>> CASSANDRA-6621 (the one that disables STCS in L0). But it will > still > >> >>>> be a > >> >>>> mess - I have a node that I have bootstrapped ~2 weeks ago. > Initially > >> >>>> it had > >> >>>> 7,5K pending compactions, now it has almost stabilized ad 4,6K. > Does > >> >>>> not go > >> >>>> down. Number of sstables at L0 is over 11K and it is slowly slowly > >> >>>> building > >> >>>> upper levels. Total number of sstables is 4x the normal amount. > Now I > >> >>>> am not > >> >>>> entirely sure if this node will ever get back to normal life. And > >> >>>> believe me > >> >>>> - this is not because of I/O, I have SSDs everywhere and 16 > physical > >> >>>> cores. > >> >>>> This machine is barely using 1-3 cores at most of the time. The > >> >>>> problem is > >> >>>> that allowing STCS fallback is not a good option either - it will > >> >>>> quickly > >> >>>> result in a few 200Gb+ sstables in my configuration and then these > >> >>>> sstables > >> >>>> will never be compacted. Plus, it will require close to 2x disk > space > >> >>>> on > >> >>>> EVERY disk in my JBOD configuration...this will kill the node > sooner > >> >>>> or > >> >>>> later. This is all because all sstables after bootstrap end at L0 > and > >> >>>> then > >> >>>> the process slowly slowly moves them to other levels. If you have > >> >>>> write > >> >>>> traffic to that CF then the number of sstables and L0 will grow > >> >>>> quickly - > >> >>>> like it happens in my case now. > >> >>>> > >> >>>> Once something like > >> >>>> https://issues.apache.org/jira/browse/CASSANDRA-8301 > >> >>>> is implemented it may be better. > >> >>>> > >> >>>> > >> >>>> On Sun, Nov 23, 2014 at 4:53 AM, Andrei Ivanov < > aiva...@iponweb.net> > >> >>>> wrote: > >> >>>>> > >> >>>>> Stephane, > >> >>>>> > >> >>>>> We are having a somewhat similar C* load profile. Hence some > >> >>>>> comments > >> >>>>> in addition Nikolai's answer. > >> >>>>> 1. Fallback to STCS - you can disable it actually > >> >>>>> 2. Based on our experience, if you have a lot of data per node, > LCS > >> >>>>> may work just fine. That is, till the moment you decide to join > >> >>>>> another node - chances are that the newly added node will not be > >> >>>
Re: Compaction Strategy guidance
Now I >> >>>> am not >> >>>> entirely sure if this node will ever get back to normal life. And >> >>>> believe me >> >>>> - this is not because of I/O, I have SSDs everywhere and 16 physical >> >>>> cores. >> >>>> This machine is barely using 1-3 cores at most of the time. The >> >>>> problem is >> >>>> that allowing STCS fallback is not a good option either - it will >> >>>> quickly >> >>>> result in a few 200Gb+ sstables in my configuration and then these >> >>>> sstables >> >>>> will never be compacted. Plus, it will require close to 2x disk space >> >>>> on >> >>>> EVERY disk in my JBOD configuration...this will kill the node sooner >> >>>> or >> >>>> later. This is all because all sstables after bootstrap end at L0 and >> >>>> then >> >>>> the process slowly slowly moves them to other levels. If you have >> >>>> write >> >>>> traffic to that CF then the number of sstables and L0 will grow >> >>>> quickly - >> >>>> like it happens in my case now. >> >>>> >> >>>> Once something like >> >>>> https://issues.apache.org/jira/browse/CASSANDRA-8301 >> >>>> is implemented it may be better. >> >>>> >> >>>> >> >>>> On Sun, Nov 23, 2014 at 4:53 AM, Andrei Ivanov >> >>>> wrote: >> >>>>> >> >>>>> Stephane, >> >>>>> >> >>>>> We are having a somewhat similar C* load profile. Hence some >> >>>>> comments >> >>>>> in addition Nikolai's answer. >> >>>>> 1. Fallback to STCS - you can disable it actually >> >>>>> 2. Based on our experience, if you have a lot of data per node, LCS >> >>>>> may work just fine. That is, till the moment you decide to join >> >>>>> another node - chances are that the newly added node will not be >> >>>>> able >> >>>>> to compact what it gets from old nodes. In your case, if you switch >> >>>>> strategy the same thing may happen. This is all due to limitations >> >>>>> mentioned by Nikolai. >> >>>>> >> >>>>> Andrei, >> >>>>> >> >>>>> >> >>>>> On Sun, Nov 23, 2014 at 8:51 AM, Servando Muñoz G. >> >>>>> >> >>>>> wrote: >> >>>>> > ABUSE >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> > YA NO QUIERO MAS MAILS SOY DE MEXICO >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> > De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com] >> >>>>> > Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m. >> >>>>> > Para: user@cassandra.apache.org >> >>>>> > Asunto: Re: Compaction Strategy guidance >> >>>>> > Importancia: Alta >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> > Stephane, >> >>>>> > >> >>>>> > As everything good, LCS comes at certain price. >> >>>>> > >> >>>>> > LCS will put most load on you I/O system (if you use spindles - >> >>>>> > you >> >>>>> > may need >> >>>>> > to be careful about that) and on CPU. Also LCS (by default) may >> >>>>> > fall >> >>>>> > back to >> >>>>> > STCS if it is falling behind (which is very possible with heavy >> >>>>> > writing >> >>>>> > activity) and this will result in higher disk space usage. Also >> >>>>> > LCS >> >>>>> > has >> >>>>> > certain limitation I have discovered lately. Sometimes LCS may not >> >>>>> > be >> >>>>> > able >> >>>>> > to use all your node's resources (algorithm limitations) and this >> >>>>> > reduces >> >>>>> > the overall compaction throughput. This may happen if you have a >> >>>>> > large >> >>>>> > column family with lots of data per node. STCS won't have this >> >>>>> > limitation. >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> > By the way, the primary goal of LCS is to reduce the number of >> >>>>> > sstables C* >> >>>>> > has to look at to find your data. With LCS properly functioning >> >>>>> > this >> >>>>> > number >> >>>>> > will be most likely between something like 1 and 3 for most of the >> >>>>> > reads. >> >>>>> > But if you do few reads and not concerned about the latency today, >> >>>>> > most >> >>>>> > likely LCS may only save you some disk space. >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> > On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay >> >>>>> > >> >>>>> > wrote: >> >>>>> > >> >>>>> > Hi there, >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> > use case: >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> > - Heavy write app, few reads. >> >>>>> > >> >>>>> > - Lots of updates of rows / columns. >> >>>>> > >> >>>>> > - Current performance is fine, for both writes and reads.. >> >>>>> > >> >>>>> > - Currently using SizedCompactionStrategy >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> > We're trying to limit the amount of storage used during >> >>>>> > compaction. >> >>>>> > Should >> >>>>> > we switch to LeveledCompactionStrategy? >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> > Thanks >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> > -- >> >>>>> > >> >>>>> > Nikolai Grigoriev >> >>>>> >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> Nikolai Grigoriev >> >>>> >> >>> >> >> >> > >> > >> > >> > -- >> > Nikolai Grigoriev >> > > > > > > -- > Nikolai Grigoriev > (514) 772-5178
Re: Compaction Strategy guidance
later. This is all because all sstables after bootstrap end at L0 and > then > >>>> the process slowly slowly moves them to other levels. If you have > write > >>>> traffic to that CF then the number of sstables and L0 will grow > quickly - > >>>> like it happens in my case now. > >>>> > >>>> Once something like > https://issues.apache.org/jira/browse/CASSANDRA-8301 > >>>> is implemented it may be better. > >>>> > >>>> > >>>> On Sun, Nov 23, 2014 at 4:53 AM, Andrei Ivanov > >>>> wrote: > >>>>> > >>>>> Stephane, > >>>>> > >>>>> We are having a somewhat similar C* load profile. Hence some comments > >>>>> in addition Nikolai's answer. > >>>>> 1. Fallback to STCS - you can disable it actually > >>>>> 2. Based on our experience, if you have a lot of data per node, LCS > >>>>> may work just fine. That is, till the moment you decide to join > >>>>> another node - chances are that the newly added node will not be able > >>>>> to compact what it gets from old nodes. In your case, if you switch > >>>>> strategy the same thing may happen. This is all due to limitations > >>>>> mentioned by Nikolai. > >>>>> > >>>>> Andrei, > >>>>> > >>>>> > >>>>> On Sun, Nov 23, 2014 at 8:51 AM, Servando Muñoz G. > > >>>>> wrote: > >>>>> > ABUSE > >>>>> > > >>>>> > > >>>>> > > >>>>> > YA NO QUIERO MAS MAILS SOY DE MEXICO > >>>>> > > >>>>> > > >>>>> > > >>>>> > De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com] > >>>>> > Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m. > >>>>> > Para: user@cassandra.apache.org > >>>>> > Asunto: Re: Compaction Strategy guidance > >>>>> > Importancia: Alta > >>>>> > > >>>>> > > >>>>> > > >>>>> > Stephane, > >>>>> > > >>>>> > As everything good, LCS comes at certain price. > >>>>> > > >>>>> > LCS will put most load on you I/O system (if you use spindles - you > >>>>> > may need > >>>>> > to be careful about that) and on CPU. Also LCS (by default) may > fall > >>>>> > back to > >>>>> > STCS if it is falling behind (which is very possible with heavy > >>>>> > writing > >>>>> > activity) and this will result in higher disk space usage. Also LCS > >>>>> > has > >>>>> > certain limitation I have discovered lately. Sometimes LCS may not > be > >>>>> > able > >>>>> > to use all your node's resources (algorithm limitations) and this > >>>>> > reduces > >>>>> > the overall compaction throughput. This may happen if you have a > >>>>> > large > >>>>> > column family with lots of data per node. STCS won't have this > >>>>> > limitation. > >>>>> > > >>>>> > > >>>>> > > >>>>> > By the way, the primary goal of LCS is to reduce the number of > >>>>> > sstables C* > >>>>> > has to look at to find your data. With LCS properly functioning > this > >>>>> > number > >>>>> > will be most likely between something like 1 and 3 for most of the > >>>>> > reads. > >>>>> > But if you do few reads and not concerned about the latency today, > >>>>> > most > >>>>> > likely LCS may only save you some disk space. > >>>>> > > >>>>> > > >>>>> > > >>>>> > On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay > >>>>> > > >>>>> > wrote: > >>>>> > > >>>>> > Hi there, > >>>>> > > >>>>> > > >>>>> > > >>>>> > use case: > >>>>> > > >>>>> > > >>>>> > > >>>>> > - Heavy write app, few reads. > >>>>> > > >>>>> > - Lots of updates of rows / columns. > >>>>> > > >>>>> > - Current performance is fine, for both writes and reads.. > >>>>> > > >>>>> > - Currently using SizedCompactionStrategy > >>>>> > > >>>>> > > >>>>> > > >>>>> > We're trying to limit the amount of storage used during compaction. > >>>>> > Should > >>>>> > we switch to LeveledCompactionStrategy? > >>>>> > > >>>>> > > >>>>> > > >>>>> > Thanks > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> > -- > >>>>> > > >>>>> > Nikolai Grigoriev > >>>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> Nikolai Grigoriev > >>>> > >>> > >> > > > > > > > > -- > > Nikolai Grigoriev > > > -- Nikolai Grigoriev (514) 772-5178
Re: Compaction Strategy guidance
>> >>>>> Stephane, >>>>> >>>>> We are having a somewhat similar C* load profile. Hence some comments >>>>> in addition Nikolai's answer. >>>>> 1. Fallback to STCS - you can disable it actually >>>>> 2. Based on our experience, if you have a lot of data per node, LCS >>>>> may work just fine. That is, till the moment you decide to join >>>>> another node - chances are that the newly added node will not be able >>>>> to compact what it gets from old nodes. In your case, if you switch >>>>> strategy the same thing may happen. This is all due to limitations >>>>> mentioned by Nikolai. >>>>> >>>>> Andrei, >>>>> >>>>> >>>>> On Sun, Nov 23, 2014 at 8:51 AM, Servando Muñoz G. >>>>> wrote: >>>>> > ABUSE >>>>> > >>>>> > >>>>> > >>>>> > YA NO QUIERO MAS MAILS SOY DE MEXICO >>>>> > >>>>> > >>>>> > >>>>> > De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com] >>>>> > Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m. >>>>> > Para: user@cassandra.apache.org >>>>> > Asunto: Re: Compaction Strategy guidance >>>>> > Importancia: Alta >>>>> > >>>>> > >>>>> > >>>>> > Stephane, >>>>> > >>>>> > As everything good, LCS comes at certain price. >>>>> > >>>>> > LCS will put most load on you I/O system (if you use spindles - you >>>>> > may need >>>>> > to be careful about that) and on CPU. Also LCS (by default) may fall >>>>> > back to >>>>> > STCS if it is falling behind (which is very possible with heavy >>>>> > writing >>>>> > activity) and this will result in higher disk space usage. Also LCS >>>>> > has >>>>> > certain limitation I have discovered lately. Sometimes LCS may not be >>>>> > able >>>>> > to use all your node's resources (algorithm limitations) and this >>>>> > reduces >>>>> > the overall compaction throughput. This may happen if you have a >>>>> > large >>>>> > column family with lots of data per node. STCS won't have this >>>>> > limitation. >>>>> > >>>>> > >>>>> > >>>>> > By the way, the primary goal of LCS is to reduce the number of >>>>> > sstables C* >>>>> > has to look at to find your data. With LCS properly functioning this >>>>> > number >>>>> > will be most likely between something like 1 and 3 for most of the >>>>> > reads. >>>>> > But if you do few reads and not concerned about the latency today, >>>>> > most >>>>> > likely LCS may only save you some disk space. >>>>> > >>>>> > >>>>> > >>>>> > On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay >>>>> > >>>>> > wrote: >>>>> > >>>>> > Hi there, >>>>> > >>>>> > >>>>> > >>>>> > use case: >>>>> > >>>>> > >>>>> > >>>>> > - Heavy write app, few reads. >>>>> > >>>>> > - Lots of updates of rows / columns. >>>>> > >>>>> > - Current performance is fine, for both writes and reads.. >>>>> > >>>>> > - Currently using SizedCompactionStrategy >>>>> > >>>>> > >>>>> > >>>>> > We're trying to limit the amount of storage used during compaction. >>>>> > Should >>>>> > we switch to LeveledCompactionStrategy? >>>>> > >>>>> > >>>>> > >>>>> > Thanks >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > -- >>>>> > >>>>> > Nikolai Grigoriev >>>>> >>>> >>>> >>>> >>>> -- >>>> Nikolai Grigoriev >>>> >>> >> > > > > -- > Nikolai Grigoriev >
Re: Compaction Strategy guidance
Jean-Armel, I have only two large tables, the rest is super-small. In the test cluster of 15 nodes the largest table has about 110M rows. Its total size is about 1,26Gb per node (total disk space used per node for that CF). It's got about 5K sstables per node - the sstable size is 256Mb. cfstats on a "healthy" node look like this: Read Count: 8973748 Read Latency: 16.130059053251774 ms. Write Count: 32099455 Write Latency: 1.6124713938912671 ms. Pending Tasks: 0 Table: wm_contacts SSTable count: 5195 SSTables in each level: [27/4, 11/10, 104/100, 1053/1000, 4000, 0, 0, 0, 0] Space used (live), bytes: 1266060391852 Space used (total), bytes: 1266144170869 SSTable Compression Ratio: 0.32604853410787327 Number of keys (estimate): 25696000 Memtable cell count: 71402 Memtable data size, bytes: 26938402 Memtable switch count: 9489 Local read count: 8973748 Local read latency: 17.696 ms Local write count: 32099471 Local write latency: 1.732 ms Pending tasks: 0 Bloom filter false positives: 32248 Bloom filter false ratio: 0.50685 Bloom filter space used, bytes: 20744432 Compacted partition minimum bytes: 104 Compacted partition maximum bytes: 3379391 Compacted partition mean bytes: 172660 Average live cells per slice (last five minutes): 495.0 Average tombstones per slice (last five minutes): 0.0 Another table of similar structure (same number of rows) is about 4x times smaller. That table does not suffer from those issues - it compacts well and efficiently. On Mon, Nov 24, 2014 at 2:30 AM, Jean-Armel Luce wrote: > Hi Nikolai, > > Please could you clarify a little bit what you call "a large amount of > data" ? > > How many tables ? > How many rows in your largest table ? > How many GB in your largest table ? > How many GB per node ? > > Thanks. > > > > 2014-11-24 8:27 GMT+01:00 Jean-Armel Luce : > >> Hi Nikolai, >> >> Thanks for those informations. >> >> Please could you clarify a little bit what you call " >> >> 2014-11-24 4:37 GMT+01:00 Nikolai Grigoriev : >> >>> Just to clarify - when I was talking about the large amount of data I >>> really meant large amount of data per node in a single CF (table). LCS does >>> not seem to like it when it gets thousands of sstables (makes 4-5 levels). >>> >>> When bootstraping a new node you'd better enable that option from >>> CASSANDRA-6621 (the one that disables STCS in L0). But it will still be a >>> mess - I have a node that I have bootstrapped ~2 weeks ago. Initially it >>> had 7,5K pending compactions, now it has almost stabilized ad 4,6K. Does >>> not go down. Number of sstables at L0 is over 11K and it is slowly slowly >>> building upper levels. Total number of sstables is 4x the normal amount. >>> Now I am not entirely sure if this node will ever get back to normal life. >>> And believe me - this is not because of I/O, I have SSDs everywhere and 16 >>> physical cores. This machine is barely using 1-3 cores at most of the time. >>> The problem is that allowing STCS fallback is not a good option either - it >>> will quickly result in a few 200Gb+ sstables in my configuration and then >>> these sstables will never be compacted. Plus, it will require close to 2x >>> disk space on EVERY disk in my JBOD configuration...this will kill the node >>> sooner or later. This is all because all sstables after bootstrap end at L0 >>> and then the process slowly slowly moves them to other levels. If you have >>> write traffic to that CF then the number of sstables and L0 will grow >>> quickly - like it happens in my case now. >>> >>> Once something like https://issues.apache.org/jira/browse/CASSANDRA-8301 >>> is implemented it may be better. >>> >>> >>> On Sun, Nov 23, 2014 at 4:53 AM, Andrei Ivanov >>> wrote: >>> >>>> Stephane, >>>> >>>> We are having a somewhat similar C* load profile. Hence some comments >>>> in addition Nikolai's answer. >>>> 1. Fallback to STCS - you can disable it actually >>>> 2. Based on our experience, if you have a lot of data per node, LCS >>>> may work just fine. That is, till the moment you decide to join >>>> another node - chances are that the newly added node will not be able >>>> to compact what it gets from old nodes. In your case, if you switch >>>> strategy the same thing may happen. This is all due to limitations >>>>
Re: Compaction Strategy guidance
Jean-Armel, I have the same problem/state as Nikolai. Here are my stats: ~ 1 table ~ 10B records ~ 2TB/node x 6 nodes Nikolai, I'm sort of wondering if switching to some larger sstable_size_in_mb (say 4096 or 8192 or something) with LCS may be a solution, even if not absolutely permanent? As for huge sstables, I do already have some 400-500GB tables. The only idea how I can manage to compact them in the future is to offline split them at some point. Does it make sense? (I'm still doing a test drive and really need to understand how we are going to handle that in production) Andrei. On Mon, Nov 24, 2014 at 10:30 AM, Jean-Armel Luce wrote: > Hi Nikolai, > > Please could you clarify a little bit what you call "a large amount of data" > ? > > How many tables ? > How many rows in your largest table ? > How many GB in your largest table ? > How many GB per node ? > > Thanks. > > > > 2014-11-24 8:27 GMT+01:00 Jean-Armel Luce : >> >> Hi Nikolai, >> >> Thanks for those informations. >> >> Please could you clarify a little bit what you call " >> >> 2014-11-24 4:37 GMT+01:00 Nikolai Grigoriev : >>> >>> Just to clarify - when I was talking about the large amount of data I >>> really meant large amount of data per node in a single CF (table). LCS does >>> not seem to like it when it gets thousands of sstables (makes 4-5 levels). >>> >>> When bootstraping a new node you'd better enable that option from >>> CASSANDRA-6621 (the one that disables STCS in L0). But it will still be a >>> mess - I have a node that I have bootstrapped ~2 weeks ago. Initially it had >>> 7,5K pending compactions, now it has almost stabilized ad 4,6K. Does not go >>> down. Number of sstables at L0 is over 11K and it is slowly slowly building >>> upper levels. Total number of sstables is 4x the normal amount. Now I am not >>> entirely sure if this node will ever get back to normal life. And believe me >>> - this is not because of I/O, I have SSDs everywhere and 16 physical cores. >>> This machine is barely using 1-3 cores at most of the time. The problem is >>> that allowing STCS fallback is not a good option either - it will quickly >>> result in a few 200Gb+ sstables in my configuration and then these sstables >>> will never be compacted. Plus, it will require close to 2x disk space on >>> EVERY disk in my JBOD configuration...this will kill the node sooner or >>> later. This is all because all sstables after bootstrap end at L0 and then >>> the process slowly slowly moves them to other levels. If you have write >>> traffic to that CF then the number of sstables and L0 will grow quickly - >>> like it happens in my case now. >>> >>> Once something like https://issues.apache.org/jira/browse/CASSANDRA-8301 >>> is implemented it may be better. >>> >>> >>> On Sun, Nov 23, 2014 at 4:53 AM, Andrei Ivanov >>> wrote: >>>> >>>> Stephane, >>>> >>>> We are having a somewhat similar C* load profile. Hence some comments >>>> in addition Nikolai's answer. >>>> 1. Fallback to STCS - you can disable it actually >>>> 2. Based on our experience, if you have a lot of data per node, LCS >>>> may work just fine. That is, till the moment you decide to join >>>> another node - chances are that the newly added node will not be able >>>> to compact what it gets from old nodes. In your case, if you switch >>>> strategy the same thing may happen. This is all due to limitations >>>> mentioned by Nikolai. >>>> >>>> Andrei, >>>> >>>> >>>> On Sun, Nov 23, 2014 at 8:51 AM, Servando Muñoz G. >>>> wrote: >>>> > ABUSE >>>> > >>>> > >>>> > >>>> > YA NO QUIERO MAS MAILS SOY DE MEXICO >>>> > >>>> > >>>> > >>>> > De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com] >>>> > Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m. >>>> > Para: user@cassandra.apache.org >>>> > Asunto: Re: Compaction Strategy guidance >>>> > Importancia: Alta >>>> > >>>> > >>>> > >>>> > Stephane, >>>> > >>>> > As everything good, LCS comes at certain price. >>>> > >>>> > LCS will put most load on you I/O system (if you use spindles - you >>>> > may need >>>> > to b
Re: Compaction Strategy guidance
Hi Nikolai, Please could you clarify a little bit what you call "a large amount of data" ? How many tables ? How many rows in your largest table ? How many GB in your largest table ? How many GB per node ? Thanks. 2014-11-24 8:27 GMT+01:00 Jean-Armel Luce : > Hi Nikolai, > > Thanks for those informations. > > Please could you clarify a little bit what you call " > > 2014-11-24 4:37 GMT+01:00 Nikolai Grigoriev : > >> Just to clarify - when I was talking about the large amount of data I >> really meant large amount of data per node in a single CF (table). LCS does >> not seem to like it when it gets thousands of sstables (makes 4-5 levels). >> >> When bootstraping a new node you'd better enable that option from >> CASSANDRA-6621 (the one that disables STCS in L0). But it will still be a >> mess - I have a node that I have bootstrapped ~2 weeks ago. Initially it >> had 7,5K pending compactions, now it has almost stabilized ad 4,6K. Does >> not go down. Number of sstables at L0 is over 11K and it is slowly slowly >> building upper levels. Total number of sstables is 4x the normal amount. >> Now I am not entirely sure if this node will ever get back to normal life. >> And believe me - this is not because of I/O, I have SSDs everywhere and 16 >> physical cores. This machine is barely using 1-3 cores at most of the time. >> The problem is that allowing STCS fallback is not a good option either - it >> will quickly result in a few 200Gb+ sstables in my configuration and then >> these sstables will never be compacted. Plus, it will require close to 2x >> disk space on EVERY disk in my JBOD configuration...this will kill the node >> sooner or later. This is all because all sstables after bootstrap end at L0 >> and then the process slowly slowly moves them to other levels. If you have >> write traffic to that CF then the number of sstables and L0 will grow >> quickly - like it happens in my case now. >> >> Once something like https://issues.apache.org/jira/browse/CASSANDRA-8301 >> is implemented it may be better. >> >> >> On Sun, Nov 23, 2014 at 4:53 AM, Andrei Ivanov >> wrote: >> >>> Stephane, >>> >>> We are having a somewhat similar C* load profile. Hence some comments >>> in addition Nikolai's answer. >>> 1. Fallback to STCS - you can disable it actually >>> 2. Based on our experience, if you have a lot of data per node, LCS >>> may work just fine. That is, till the moment you decide to join >>> another node - chances are that the newly added node will not be able >>> to compact what it gets from old nodes. In your case, if you switch >>> strategy the same thing may happen. This is all due to limitations >>> mentioned by Nikolai. >>> >>> Andrei, >>> >>> >>> On Sun, Nov 23, 2014 at 8:51 AM, Servando Muñoz G. >>> wrote: >>> > ABUSE >>> > >>> > >>> > >>> > YA NO QUIERO MAS MAILS SOY DE MEXICO >>> > >>> > >>> > >>> > De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com] >>> > Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m. >>> > Para: user@cassandra.apache.org >>> > Asunto: Re: Compaction Strategy guidance >>> > Importancia: Alta >>> > >>> > >>> > >>> > Stephane, >>> > >>> > As everything good, LCS comes at certain price. >>> > >>> > LCS will put most load on you I/O system (if you use spindles - you >>> may need >>> > to be careful about that) and on CPU. Also LCS (by default) may fall >>> back to >>> > STCS if it is falling behind (which is very possible with heavy writing >>> > activity) and this will result in higher disk space usage. Also LCS has >>> > certain limitation I have discovered lately. Sometimes LCS may not be >>> able >>> > to use all your node's resources (algorithm limitations) and this >>> reduces >>> > the overall compaction throughput. This may happen if you have a large >>> > column family with lots of data per node. STCS won't have this >>> limitation. >>> > >>> > >>> > >>> > By the way, the primary goal of LCS is to reduce the number of >>> sstables C* >>> > has to look at to find your data. With LCS properly functioning this >>> number >>> > will be most likely between something like 1 and 3 for most of the >>> reads. >>> > But if you do few reads and not concerned about the latency today, most >>> > likely LCS may only save you some disk space. >>> > >>> > >>> > >>> > On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay >>> > wrote: >>> > >>> > Hi there, >>> > >>> > >>> > >>> > use case: >>> > >>> > >>> > >>> > - Heavy write app, few reads. >>> > >>> > - Lots of updates of rows / columns. >>> > >>> > - Current performance is fine, for both writes and reads.. >>> > >>> > - Currently using SizedCompactionStrategy >>> > >>> > >>> > >>> > We're trying to limit the amount of storage used during compaction. >>> Should >>> > we switch to LeveledCompactionStrategy? >>> > >>> > >>> > >>> > Thanks >>> > >>> > >>> > >>> > >>> > -- >>> > >>> > Nikolai Grigoriev >>> > (514) 772-5178 >>> >> >> >> >> -- >> Nikolai Grigoriev >> (514) 772-5178 >> > >
Re: Compaction Strategy guidance
Hi Nikolai, Thanks for those informations. Please could you clarify a little bit what you call " 2014-11-24 4:37 GMT+01:00 Nikolai Grigoriev : > Just to clarify - when I was talking about the large amount of data I > really meant large amount of data per node in a single CF (table). LCS does > not seem to like it when it gets thousands of sstables (makes 4-5 levels). > > When bootstraping a new node you'd better enable that option from > CASSANDRA-6621 (the one that disables STCS in L0). But it will still be a > mess - I have a node that I have bootstrapped ~2 weeks ago. Initially it > had 7,5K pending compactions, now it has almost stabilized ad 4,6K. Does > not go down. Number of sstables at L0 is over 11K and it is slowly slowly > building upper levels. Total number of sstables is 4x the normal amount. > Now I am not entirely sure if this node will ever get back to normal life. > And believe me - this is not because of I/O, I have SSDs everywhere and 16 > physical cores. This machine is barely using 1-3 cores at most of the time. > The problem is that allowing STCS fallback is not a good option either - it > will quickly result in a few 200Gb+ sstables in my configuration and then > these sstables will never be compacted. Plus, it will require close to 2x > disk space on EVERY disk in my JBOD configuration...this will kill the node > sooner or later. This is all because all sstables after bootstrap end at L0 > and then the process slowly slowly moves them to other levels. If you have > write traffic to that CF then the number of sstables and L0 will grow > quickly - like it happens in my case now. > > Once something like https://issues.apache.org/jira/browse/CASSANDRA-8301 > is implemented it may be better. > > > On Sun, Nov 23, 2014 at 4:53 AM, Andrei Ivanov > wrote: > >> Stephane, >> >> We are having a somewhat similar C* load profile. Hence some comments >> in addition Nikolai's answer. >> 1. Fallback to STCS - you can disable it actually >> 2. Based on our experience, if you have a lot of data per node, LCS >> may work just fine. That is, till the moment you decide to join >> another node - chances are that the newly added node will not be able >> to compact what it gets from old nodes. In your case, if you switch >> strategy the same thing may happen. This is all due to limitations >> mentioned by Nikolai. >> >> Andrei, >> >> >> On Sun, Nov 23, 2014 at 8:51 AM, Servando Muñoz G. >> wrote: >> > ABUSE >> > >> > >> > >> > YA NO QUIERO MAS MAILS SOY DE MEXICO >> > >> > >> > >> > De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com] >> > Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m. >> > Para: user@cassandra.apache.org >> > Asunto: Re: Compaction Strategy guidance >> > Importancia: Alta >> > >> > >> > >> > Stephane, >> > >> > As everything good, LCS comes at certain price. >> > >> > LCS will put most load on you I/O system (if you use spindles - you may >> need >> > to be careful about that) and on CPU. Also LCS (by default) may fall >> back to >> > STCS if it is falling behind (which is very possible with heavy writing >> > activity) and this will result in higher disk space usage. Also LCS has >> > certain limitation I have discovered lately. Sometimes LCS may not be >> able >> > to use all your node's resources (algorithm limitations) and this >> reduces >> > the overall compaction throughput. This may happen if you have a large >> > column family with lots of data per node. STCS won't have this >> limitation. >> > >> > >> > >> > By the way, the primary goal of LCS is to reduce the number of sstables >> C* >> > has to look at to find your data. With LCS properly functioning this >> number >> > will be most likely between something like 1 and 3 for most of the >> reads. >> > But if you do few reads and not concerned about the latency today, most >> > likely LCS may only save you some disk space. >> > >> > >> > >> > On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay >> > wrote: >> > >> > Hi there, >> > >> > >> > >> > use case: >> > >> > >> > >> > - Heavy write app, few reads. >> > >> > - Lots of updates of rows / columns. >> > >> > - Current performance is fine, for both writes and reads.. >> > >> > - Currently using SizedCompactionStrategy >> > >> > >> > >> > We're trying to limit the amount of storage used during compaction. >> Should >> > we switch to LeveledCompactionStrategy? >> > >> > >> > >> > Thanks >> > >> > >> > >> > >> > -- >> > >> > Nikolai Grigoriev >> > (514) 772-5178 >> > > > > -- > Nikolai Grigoriev > (514) 772-5178 >
Re: Compaction Strategy guidance
Just to clarify - when I was talking about the large amount of data I really meant large amount of data per node in a single CF (table). LCS does not seem to like it when it gets thousands of sstables (makes 4-5 levels). When bootstraping a new node you'd better enable that option from CASSANDRA-6621 (the one that disables STCS in L0). But it will still be a mess - I have a node that I have bootstrapped ~2 weeks ago. Initially it had 7,5K pending compactions, now it has almost stabilized ad 4,6K. Does not go down. Number of sstables at L0 is over 11K and it is slowly slowly building upper levels. Total number of sstables is 4x the normal amount. Now I am not entirely sure if this node will ever get back to normal life. And believe me - this is not because of I/O, I have SSDs everywhere and 16 physical cores. This machine is barely using 1-3 cores at most of the time. The problem is that allowing STCS fallback is not a good option either - it will quickly result in a few 200Gb+ sstables in my configuration and then these sstables will never be compacted. Plus, it will require close to 2x disk space on EVERY disk in my JBOD configuration...this will kill the node sooner or later. This is all because all sstables after bootstrap end at L0 and then the process slowly slowly moves them to other levels. If you have write traffic to that CF then the number of sstables and L0 will grow quickly - like it happens in my case now. Once something like https://issues.apache.org/jira/browse/CASSANDRA-8301 is implemented it may be better. On Sun, Nov 23, 2014 at 4:53 AM, Andrei Ivanov wrote: > Stephane, > > We are having a somewhat similar C* load profile. Hence some comments > in addition Nikolai's answer. > 1. Fallback to STCS - you can disable it actually > 2. Based on our experience, if you have a lot of data per node, LCS > may work just fine. That is, till the moment you decide to join > another node - chances are that the newly added node will not be able > to compact what it gets from old nodes. In your case, if you switch > strategy the same thing may happen. This is all due to limitations > mentioned by Nikolai. > > Andrei, > > > On Sun, Nov 23, 2014 at 8:51 AM, Servando Muñoz G. > wrote: > > ABUSE > > > > > > > > YA NO QUIERO MAS MAILS SOY DE MEXICO > > > > > > > > De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com] > > Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m. > > Para: user@cassandra.apache.org > > Asunto: Re: Compaction Strategy guidance > > Importancia: Alta > > > > > > > > Stephane, > > > > As everything good, LCS comes at certain price. > > > > LCS will put most load on you I/O system (if you use spindles - you may > need > > to be careful about that) and on CPU. Also LCS (by default) may fall > back to > > STCS if it is falling behind (which is very possible with heavy writing > > activity) and this will result in higher disk space usage. Also LCS has > > certain limitation I have discovered lately. Sometimes LCS may not be > able > > to use all your node's resources (algorithm limitations) and this reduces > > the overall compaction throughput. This may happen if you have a large > > column family with lots of data per node. STCS won't have this > limitation. > > > > > > > > By the way, the primary goal of LCS is to reduce the number of sstables > C* > > has to look at to find your data. With LCS properly functioning this > number > > will be most likely between something like 1 and 3 for most of the reads. > > But if you do few reads and not concerned about the latency today, most > > likely LCS may only save you some disk space. > > > > > > > > On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay > > wrote: > > > > Hi there, > > > > > > > > use case: > > > > > > > > - Heavy write app, few reads. > > > > - Lots of updates of rows / columns. > > > > - Current performance is fine, for both writes and reads.. > > > > - Currently using SizedCompactionStrategy > > > > > > > > We're trying to limit the amount of storage used during compaction. > Should > > we switch to LeveledCompactionStrategy? > > > > > > > > Thanks > > > > > > > > > > -- > > > > Nikolai Grigoriev > > (514) 772-5178 > -- Nikolai Grigoriev (514) 772-5178
Re: Compaction Strategy guidance
Stephane, We are having a somewhat similar C* load profile. Hence some comments in addition Nikolai's answer. 1. Fallback to STCS - you can disable it actually 2. Based on our experience, if you have a lot of data per node, LCS may work just fine. That is, till the moment you decide to join another node - chances are that the newly added node will not be able to compact what it gets from old nodes. In your case, if you switch strategy the same thing may happen. This is all due to limitations mentioned by Nikolai. Andrei, On Sun, Nov 23, 2014 at 8:51 AM, Servando Muñoz G. wrote: > ABUSE > > > > YA NO QUIERO MAS MAILS SOY DE MEXICO > > > > De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com] > Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m. > Para: user@cassandra.apache.org > Asunto: Re: Compaction Strategy guidance > Importancia: Alta > > > > Stephane, > > As everything good, LCS comes at certain price. > > LCS will put most load on you I/O system (if you use spindles - you may need > to be careful about that) and on CPU. Also LCS (by default) may fall back to > STCS if it is falling behind (which is very possible with heavy writing > activity) and this will result in higher disk space usage. Also LCS has > certain limitation I have discovered lately. Sometimes LCS may not be able > to use all your node's resources (algorithm limitations) and this reduces > the overall compaction throughput. This may happen if you have a large > column family with lots of data per node. STCS won't have this limitation. > > > > By the way, the primary goal of LCS is to reduce the number of sstables C* > has to look at to find your data. With LCS properly functioning this number > will be most likely between something like 1 and 3 for most of the reads. > But if you do few reads and not concerned about the latency today, most > likely LCS may only save you some disk space. > > > > On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay > wrote: > > Hi there, > > > > use case: > > > > - Heavy write app, few reads. > > - Lots of updates of rows / columns. > > - Current performance is fine, for both writes and reads.. > > - Currently using SizedCompactionStrategy > > > > We're trying to limit the amount of storage used during compaction. Should > we switch to LeveledCompactionStrategy? > > > > Thanks > > > > > -- > > Nikolai Grigoriev > (514) 772-5178
RE: Compaction Strategy guidance
ABUSE YA NO QUIERO MAS MAILS SOY DE MEXICO De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com] Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m. Para: user@cassandra.apache.org Asunto: Re: Compaction Strategy guidance Importancia: Alta Stephane, As everything good, LCS comes at certain price. LCS will put most load on you I/O system (if you use spindles - you may need to be careful about that) and on CPU. Also LCS (by default) may fall back to STCS if it is falling behind (which is very possible with heavy writing activity) and this will result in higher disk space usage. Also LCS has certain limitation I have discovered lately. Sometimes LCS may not be able to use all your node's resources (algorithm limitations) and this reduces the overall compaction throughput. This may happen if you have a large column family with lots of data per node. STCS won't have this limitation. By the way, the primary goal of LCS is to reduce the number of sstables C* has to look at to find your data. With LCS properly functioning this number will be most likely between something like 1 and 3 for most of the reads. But if you do few reads and not concerned about the latency today, most likely LCS may only save you some disk space. On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay wrote: Hi there, use case: - Heavy write app, few reads. - Lots of updates of rows / columns. - Current performance is fine, for both writes and reads.. - Currently using SizedCompactionStrategy We're trying to limit the amount of storage used during compaction. Should we switch to LeveledCompactionStrategy? Thanks -- Nikolai Grigoriev (514) 772-5178
Re: Compaction Strategy guidance
Stephane, As everything good, LCS comes at certain price. LCS will put most load on you I/O system (if you use spindles - you may need to be careful about that) and on CPU. Also LCS (by default) may fall back to STCS if it is falling behind (which is very possible with heavy writing activity) and this will result in higher disk space usage. Also LCS has certain limitation I have discovered lately. Sometimes LCS may not be able to use all your node's resources (algorithm limitations) and this reduces the overall compaction throughput. This may happen if you have a large column family with lots of data per node. STCS won't have this limitation. By the way, the primary goal of LCS is to reduce the number of sstables C* has to look at to find your data. With LCS properly functioning this number will be most likely between something like 1 and 3 for most of the reads. But if you do few reads and not concerned about the latency today, most likely LCS may only save you some disk space. On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay wrote: > Hi there, > > use case: > > - Heavy write app, few reads. > - Lots of updates of rows / columns. > - Current performance is fine, for both writes and reads.. > - Currently using SizedCompactionStrategy > > We're trying to limit the amount of storage used during compaction. Should > we switch to LeveledCompactionStrategy? > > Thanks > -- Nikolai Grigoriev (514) 772-5178
Re: compaction strategy
You are of course free to reduce the min per bucket to 2. The fundamental idea of sstables + compaction is to trade disk space for higher write performance. For most applications this is the right trade to make on modern hardware... I don't think you'll get very far trying to get the 2nd without the 1st. On Wed, May 11, 2011 at 3:49 AM, Terje Marthinussen wrote: >> >> Not sure I follow you. 4 sstables is the minimum compaction look for >> (by default). >> If there is 30 sstables of ~20MB sitting there because compaction is >> behind, you >> will compact those 30 sstables together (unless there is not enough space >> for >> that and considering you haven't changed the max compaction threshold (32 >> by >> default)). And you can increase max threshold. >> Don't get me wrong, I'm not pretending this works better than it does, but >> let's not pretend either that it's worth than it is. >> > > Sorry, I am not trying to pretend anything or blow it out of proportions. > Just reacting to what I see. > This is what I see after some stress testing of some pretty decent HW. > 81 Up Normal 181.6 GB 8.33% Token(bytes[30]) > > 82 Up Normal 501.43 GB 8.33% Token(bytes[313230]) > > 83 Up Normal 248.07 GB 8.33% Token(bytes[313437]) > > 84 Up Normal 349.64 GB 8.33% Token(bytes[313836]) > > 85 Up Normal 511.55 GB 8.33% Token(bytes[323336]) > > 86 Up Normal 654.93 GB 8.33% Token(bytes[333234]) > > 87 Up Normal 534.77 GB 8.33% Token(bytes[333939]) > > 88 Up Normal 525.88 GB 8.33% Token(bytes[343739]) > > 89 Up Normal 476.6 GB 8.33% Token(bytes[353730]) > > 90 Up Normal 424.89 GB 8.33% Token(bytes[363635]) > > 91 Up Normal 338.14 GB 8.33% Token(bytes[383036]) > > 92 Up Normal 546.95 GB 8.33% Token(bytes[6a]) > .81 has been exposed to a full compaction. It had ~370GB before that and the > resulting sstable is 165GB. > The other nodes has only been doing minor compactions > I think this is a problem. > You are of course free to disagree. > I do however recommend doing a simulation on potential worst case scenarios > if many of the buckets end up with 3 sstables and don't compact for a while. > The disk space requirements get pretty bad even without getting into > theoretical worst cases. > Regards, > Terje -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: compaction strategy
> > > Not sure I follow you. 4 sstables is the minimum compaction look for > (by default). > If there is 30 sstables of ~20MB sitting there because compaction is > behind, you > will compact those 30 sstables together (unless there is not enough space > for > that and considering you haven't changed the max compaction threshold (32 > by > default)). And you can increase max threshold. > Don't get me wrong, I'm not pretending this works better than it does, but > let's not pretend either that it's worth than it is. > > Sorry, I am not trying to pretend anything or blow it out of proportions. Just reacting to what I see. This is what I see after some stress testing of some pretty decent HW. 81 Up Normal 181.6 GB8.33% Token(bytes[30]) 82 Up Normal 501.43 GB 8.33% Token(bytes[313230]) 83 Up Normal 248.07 GB 8.33% Token(bytes[313437]) 84 Up Normal 349.64 GB 8.33% Token(bytes[313836]) 85 Up Normal 511.55 GB 8.33% Token(bytes[323336]) 86 Up Normal 654.93 GB 8.33% Token(bytes[333234]) 87 UpNormal 534.77 GB 8.33% Token(bytes[333939]) 88 Up Normal 525.88 GB 8.33% Token(bytes[343739]) 89 Up Normal 476.6 GB8.33% Token(bytes[353730]) 90 Up Normal 424.89 GB 8.33% Token(bytes[363635]) 91 Up Normal 338.14 GB 8.33% Token(bytes[383036]) 92 Up Normal 546.95 GB 8.33% Token(bytes[6a]) .81 has been exposed to a full compaction. It had ~370GB before that and the resulting sstable is 165GB. The other nodes has only been doing minor compactions I think this is a problem. You are of course free to disagree. I do however recommend doing a simulation on potential worst case scenarios if many of the buckets end up with 3 sstables and don't compact for a while. The disk space requirements get pretty bad even without getting into theoretical worst cases. Regards, Terje
Re: compaction strategy
On Tue, May 10, 2011 at 6:20 PM, Terje Marthinussen wrote: > >> Everyone may be well aware of that, but I'll still remark that a minor >> compaction >> will try to merge "as many 20MB sstables as it can" up to the max >> compaction >> threshold (which is configurable). So if you do accumulate some newly >> created >> sstable at some point in time, the next minor compaction will take all of >> them >> and thus not create a 40 MB sstable, then 80MB etc... Sure there will be >> more >> step than with a major compaction, but let's keep in mind we don't >> merge sstables >> 2 by 2. > > Well, you do kind of merge them 2 by 2 as you look for at least 4 at a time > ;) > But yes, 20MB should become at least 80MB. Still quite a few hops to reach > 100GB. Not sure I follow you. 4 sstables is the minimum compaction look for (by default). If there is 30 sstables of ~20MB sitting there because compaction is behind, you will compact those 30 sstables together (unless there is not enough space for that and considering you haven't changed the max compaction threshold (32 by default)). And you can increase max threshold. Don't get me wrong, I'm not pretending this works better than it does, but let's not pretend either that it's worth than it is. > >> I'm also not too much in favor of triggering major compactions, >> because it mostly >> have a nasty effect (create one huge sstable). Now maybe we could expose >> the >> difference factor for which we'll consider sstables in the same bucket > > The nasty side effect I am scared of is disk space and to keep the disk > space under control, I need to get down to 1 file. > > As an example: > 2 days ago, I looked at a system that had gone idle from compaction with > something like 24 sstables. > Disk use was 370GB. > > After manually triggering full compaction, I was left with a single sstable > which is 164 GB large. > > This means I may need more than 3x the full dataset to survive if certain > nasty events such as repairs or anti compactions should occur. > Way more than the recommended 2x. > > In the same system, I see nodes reaching up towards 900GB during compaction > and 5-600GB otherwise. > This is with OPP, so distribution is not 100% perfect, but I expect these > 5-600GB nodes to compact down to the <200GB area if a full compaction is > triggered. > > That is way way beyond the recommendation to have 2x the disk space. > > You may disagree, but I think this is a problem. I absolutely do not disagree. I was just arguing that I'm not sure triggering a major compaction based on some fuzzy heuristic is a good solution to the problem. And we do know that compaction could and should be improved, both to make it have less impact on read when it's behind: https://issues.apache.org/jira/browse/CASSANDRA-2498 to allow for easily testing different strategy: https://issues.apache.org/jira/browse/CASSANDRA-1610 as well as redesigning the mechanism: https://issues.apache.org/jira/browse/CASSANDRA-1608 You'll see in particular in that last ticket comments that segmenting on token space has been suggested already and there is probably a handful of thread about vnodes in the mailing list archives. And I personally think that yes, partitioning the sstables is a good idea. > Either we need to recommend 3-5x the best case disk usage or we need to fix > cassandra. > > A simple improvement initially may be to change the bucketing strategy if > you cannot find suitable candidates. > I believe lucene for instance has a strategy where it can mix a set of small > index fragments with one large. > This may be possible to consider as a fallback strategy and just let > cassandra compact down to 1 file whenever it can. > > Ultimately, I think segmenting on token space is the only way to fix this. > That segmentation could be done by building histograms of your token > distribution as you compact and the compaction can further adjust the > segments accordingly as full compactions take place. > > This would seem simpler to do than a full vnode based infrastructure. > > Terje >
Re: compaction strategy
> Everyone may be well aware of that, but I'll still remark that a minor > compaction > will try to merge "as many 20MB sstables as it can" up to the max > compaction > threshold (which is configurable). So if you do accumulate some newly > created > sstable at some point in time, the next minor compaction will take all of > them > and thus not create a 40 MB sstable, then 80MB etc... Sure there will be > more > step than with a major compaction, but let's keep in mind we don't > merge sstables > 2 by 2. > Well, you do kind of merge them 2 by 2 as you look for at least 4 at a time ;) But yes, 20MB should become at least 80MB. Still quite a few hops to reach 100GB. I'm also not too much in favor of triggering major compactions, > because it mostly > have a nasty effect (create one huge sstable). Now maybe we could expose > the > difference factor for which we'll consider sstables in the same bucket > The nasty side effect I am scared of is disk space and to keep the disk space under control, I need to get down to 1 file. As an example: 2 days ago, I looked at a system that had gone idle from compaction with something like 24 sstables. Disk use was 370GB. After manually triggering full compaction, I was left with a single sstable which is 164 GB large. This means I may need more than 3x the full dataset to survive if certain nasty events such as repairs or anti compactions should occur. Way more than the recommended 2x. In the same system, I see nodes reaching up towards 900GB during compaction and 5-600GB otherwise. This is with OPP, so distribution is not 100% perfect, but I expect these 5-600GB nodes to compact down to the <200GB area if a full compaction is triggered. That is way way beyond the recommendation to have 2x the disk space. You may disagree, but I think this is a problem. Either we need to recommend 3-5x the best case disk usage or we need to fix cassandra. A simple improvement initially may be to change the bucketing strategy if you cannot find suitable candidates. I believe lucene for instance has a strategy where it can mix a set of small index fragments with one large. This may be possible to consider as a fallback strategy and just let cassandra compact down to 1 file whenever it can. Ultimately, I think segmenting on token space is the only way to fix this. That segmentation could be done by building histograms of your token distribution as you compact and the compaction can further adjust the segments accordingly as full compactions take place. This would seem simpler to do than a full vnode based infrastructure. Terje
Re: compaction strategy
Sorry, I was referring to the claim that "one big file" was a problem, not the non-overlapping part. If you never compact to a single file, you never get rid of all generations/duplicates. With non-overlapping files covering small enough token ranges, compacting down to one file is not a big issue. Terje On Mon, May 9, 2011 at 8:52 PM, David Boxenhorn wrote: > If they each have their own copy of the data, then they are *not* > non-overlapping! > > If you have non-overlapping SSTables (and you know the min/max keys), it's > like having one big SSTable because you know exactly where each row is, and > it becomes easy to merge a new SSTable in small batches, rather than in one > huge batch. > > The only step that you have to add to the current merge process is, when > you going to write a new SSTable, if it's too big, to write N > (non-overlapping!) pieces instead. > > > On Mon, May 9, 2011 at 12:46 PM, Terje Marthinussen < > tmarthinus...@gmail.com> wrote: > >> Yes, agreed. >> >> I actually think cassandra has to. >> >> And if you do not go down to that single file, how do you avoid getting >> into a situation where you can very realistically end up with 4-5 big >> sstables each having its own copy of the same data massively increasing disk >> requirements? >> >> Terje >> >> On Mon, May 9, 2011 at 5:58 PM, David Boxenhorn wrote: >> >>> "I'm also not too much in favor of triggering major compactions, because >>> it mostly have a nasty effect (create one huge sstable)." >>> >>> If that is the case, why can't major compactions create many, >>> non-overlapping SSTables? >>> >>> In general, it seems to me that non-overlapping SSTables have all the >>> advantages of big SSTables (i.e. you know exactly where the data is) without >>> the disadvantages that come with being big. Why doesn't Cassandra take >>> advantage of that in a major way? >>> >> >> >
Re: compaction strategy
If they each have their own copy of the data, then they are *not* non-overlapping! If you have non-overlapping SSTables (and you know the min/max keys), it's like having one big SSTable because you know exactly where each row is, and it becomes easy to merge a new SSTable in small batches, rather than in one huge batch. The only step that you have to add to the current merge process is, when you going to write a new SSTable, if it's too big, to write N (non-overlapping!) pieces instead. On Mon, May 9, 2011 at 12:46 PM, Terje Marthinussen wrote: > Yes, agreed. > > I actually think cassandra has to. > > And if you do not go down to that single file, how do you avoid getting > into a situation where you can very realistically end up with 4-5 big > sstables each having its own copy of the same data massively increasing disk > requirements? > > Terje > > On Mon, May 9, 2011 at 5:58 PM, David Boxenhorn wrote: > >> "I'm also not too much in favor of triggering major compactions, because >> it mostly have a nasty effect (create one huge sstable)." >> >> If that is the case, why can't major compactions create many, >> non-overlapping SSTables? >> >> In general, it seems to me that non-overlapping SSTables have all the >> advantages of big SSTables (i.e. you know exactly where the data is) without >> the disadvantages that come with being big. Why doesn't Cassandra take >> advantage of that in a major way? >> > >
Re: compaction strategy
Yes, agreed. I actually think cassandra has to. And if you do not go down to that single file, how do you avoid getting into a situation where you can very realistically end up with 4-5 big sstables each having its own copy of the same data massively increasing disk requirements? Terje On Mon, May 9, 2011 at 5:58 PM, David Boxenhorn wrote: > "I'm also not too much in favor of triggering major compactions, because it > mostly have a nasty effect (create one huge sstable)." > > If that is the case, why can't major compactions create many, > non-overlapping SSTables? > > In general, it seems to me that non-overlapping SSTables have all the > advantages of big SSTables (i.e. you know exactly where the data is) without > the disadvantages that come with being big. Why doesn't Cassandra take > advantage of that in a major way? >
Re: compaction strategy
"I'm also not too much in favor of triggering major compactions, because it mostly have a nasty effect (create one huge sstable)." If that is the case, why can't major compactions create many, non-overlapping SSTables? In general, it seems to me that non-overlapping SSTables have all the advantages of big SSTables (i.e. you know exactly where the data is) without the disadvantages that come with being big. Why doesn't Cassandra take advantage of that in a major way?
Re: compaction strategy
On Sat, May 7, 2011 at 7:20 PM, Terje Marthinussen wrote: > This is an all ssd system. I have no problems with read/write performance > due to I/O. > I do have a potential with the crazy explosion you can get in terms of disk > use if compaction cannot keep up. > > As things falls behind and you get many generations of data, yes, read > performance gets a problem due to the number of sstables. > > As things start falling behind, you have a bunch of minor compactions trying > to merge 20MB (sstables cassandra generally dumps with current config when > under pressure) into 40 MB into 80MB into Everyone may be well aware of that, but I'll still remark that a minor compaction will try to merge "as many 20MB sstables as it can" up to the max compaction threshold (which is configurable). So if you do accumulate some newly created sstable at some point in time, the next minor compaction will take all of them and thus not create a 40 MB sstable, then 80MB etc... Sure there will be more step than with a major compaction, but let's keep in mind we don't merge sstables 2 by 2. I'm also not too much in favor of triggering major compactions, because it mostly have a nasty effect (create one huge sstable). Now maybe we could expose the difference factor for which we'll consider sstables in the same bucket (i.e, of similar size). As a side note, I think that https://issues.apache.org/jira/browse/CASSANDRA-1610, if done correctly, could help in such situation in that one could try a strategy adapted to it's work load. > > Anyone wants to do the math on how many times you are rewriting the data > going this route? > > There is just no way this can keep up. It will just fall more and more > behind. > Only way to recover as I can see would be to trigger a full compaction? > > It does not really make sense to me to go through all these minor merges > when a full compaction will do a much faster and better job. > > Terje > > On Sat, May 7, 2011 at 9:54 PM, Jonathan Ellis wrote: >> >> On Sat, May 7, 2011 at 2:01 AM, Terje Marthinussen >> wrote: >> > 1. Would it make sense to make full compactions occur a bit more >> > aggressive. >> >> I'd rather reduce the performance impact of being behind, than do more >> full compactions: https://issues.apache.org/jira/browse/CASSANDRA-2498 >> >> > 2. I >> > would think the code should be smart enough to either trigger a full >> > compaction and scrap the current queue, or at least merge some of those >> > pending tasks into larger ones >> >> Not crazy but a queue-rewriter would be nontrivial. For now I'm okay >> with saying "add capacity until compaction can mostly keep up." (Most >> people's problem is making compaction LESS aggressive, hence >> https://issues.apache.org/jira/browse/CASSANDRA-2156.) >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com > >
Re: compaction strategy
> It does not really make sense to me to go through all these minor merges > when a full compaction will do a much faster and better job. In a system heavily reliant on caching (platter drives, large data sizes, much larger than RAM) major compactions can be very detrimental to performance due to the effects of the temporary spike in data size and cache coldness. Sounds like it makes good sense in your situation though. -- / Peter Schuller
Re: compaction strategy
This is an all ssd system. I have no problems with read/write performance due to I/O. I do have a potential with the crazy explosion you can get in terms of disk use if compaction cannot keep up. As things falls behind and you get many generations of data, yes, read performance gets a problem due to the number of sstables. As things start falling behind, you have a bunch of minor compactions trying to merge 20MB (sstables cassandra generally dumps with current config when under pressure) into 40 MB into 80MB into Anyone wants to do the math on how many times you are rewriting the data going this route? There is just no way this can keep up. It will just fall more and more behind. Only way to recover as I can see would be to trigger a full compaction? It does not really make sense to me to go through all these minor merges when a full compaction will do a much faster and better job. Terje On Sat, May 7, 2011 at 9:54 PM, Jonathan Ellis wrote: > On Sat, May 7, 2011 at 2:01 AM, Terje Marthinussen > wrote: > > 1. Would it make sense to make full compactions occur a bit more > aggressive. > > I'd rather reduce the performance impact of being behind, than do more > full compactions: https://issues.apache.org/jira/browse/CASSANDRA-2498 > > > 2. I > > would think the code should be smart enough to either trigger a full > > compaction and scrap the current queue, or at least merge some of those > > pending tasks into larger ones > > Not crazy but a queue-rewriter would be nontrivial. For now I'm okay > with saying "add capacity until compaction can mostly keep up." (Most > people's problem is making compaction LESS aggressive, hence > https://issues.apache.org/jira/browse/CASSANDRA-2156.) > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com >
Re: compaction strategy
> If you are seeing 600 pending compaction tasks regularly you almost > definitely need more hardware. Note that pending compactions is pretty misleading and you can't really draw conclusions just based on the pending compactions number/graph. For example, standard behavior during e.g.a long repair may end up accumulating thousands of pending compactions that suddenly drop to zero once you're done and a bunch of tasks that don't actually need to do anything are "completed". With the concurrent compaction support I suppose this will be mitigated as long as you don't hit your concurrency limit. -- / Peter Schuller
Re: compaction strategy
On Sat, May 7, 2011 at 8:54 AM, Jonathan Ellis wrote: > On Sat, May 7, 2011 at 2:01 AM, Terje Marthinussen > wrote: >> 1. Would it make sense to make full compactions occur a bit more aggressive. > > I'd rather reduce the performance impact of being behind, than do more > full compactions: https://issues.apache.org/jira/browse/CASSANDRA-2498 > >> 2. I >> would think the code should be smart enough to either trigger a full >> compaction and scrap the current queue, or at least merge some of those >> pending tasks into larger ones > > Not crazy but a queue-rewriter would be nontrivial. For now I'm okay > with saying "add capacity until compaction can mostly keep up." (Most > people's problem is making compaction LESS aggressive, hence > https://issues.apache.org/jira/browse/CASSANDRA-2156.) > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com > Adjusting compaction and memtable settings is a tuning thing. Tuning is not usually a game changer. Every once in a while you hit a something wonderful and get 20% or 30% enhancement. But normally you are in the gain 1%-3% range. If you are seeing 600 pending compaction tasks regularly you almost definitely need more hardware.
Re: compaction strategy
On Sat, May 7, 2011 at 2:01 AM, Terje Marthinussen wrote: > 1. Would it make sense to make full compactions occur a bit more aggressive. I'd rather reduce the performance impact of being behind, than do more full compactions: https://issues.apache.org/jira/browse/CASSANDRA-2498 > 2. I > would think the code should be smart enough to either trigger a full > compaction and scrap the current queue, or at least merge some of those > pending tasks into larger ones Not crazy but a queue-rewriter would be nontrivial. For now I'm okay with saying "add capacity until compaction can mostly keep up." (Most people's problem is making compaction LESS aggressive, hence https://issues.apache.org/jira/browse/CASSANDRA-2156.) -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com