Re: Leveled Compaction Strategy with a really intensive delete workload
this is to cope with some unexpected data transformations that I hope are a temporary thing. We chose LCS strategy because of really wide rows which were spanning several SStables with other compaction strategies (and hence leading to high latency read queries). I was honestly thinking of scraping and rebuilding the SStable from scratch if this workload is confirmed to be temporary. Knowing the answer to my question above would help second guess my a decision a bit less :) Cheers, Stefano On Mon, May 25, 2015 at 9:52 AM, Jason Wee peich...@gmail.com wrote: , due to a really intensive delete workloads, the SSTable is promoted to t.. Is cassandra design for *delete* workloads? doubt so. Perhaps looking at some other alternative like ttl? jason On Mon, May 25, 2015 at 10:12 AM, Manoj Khangaonkar khangaon...@gmail.com wrote: Hi, For a delete intensive workload ( translate to write intensive), is there any reason to use leveled compaction ? The recommendation seems to be that leveled compaction is suited for read intensive workloads. Depending on your use case, you might better of with data tiered or size tiered strategy. regards regards On Sun, May 24, 2015 at 10:50 AM, Stefano Ortolani ostef...@gmail.com wrote: Hi all, I have a question re leveled compaction strategy that has been bugging me quite a lot lately. Based on what I understood, a compaction takes place when the SSTable gets to a specific size (10 times the size of its previous generation). My question is about an edge case where, due to a really intensive delete workloads, the SSTable is promoted to the next level (say L1) and its size, because of the many evicted tombstones, fall back to 1/10 of its size (hence to a size compatible to the previous generation, L0). What happens in this case? If the next major compaction is set to happen when the SSTable is promoted to L2, well, that might take too long and too many tobmstones could then appear in the meanwhile (and queries might subsequently fail). Wouldn't be more correct to flag the SStable's generation to its previous value (namely, not changing it even if a major compaction took place)? Regards, Stefano Ortolani -- http://khangaonkar.blogspot.com/
Re: Leveled Compaction Strategy with a really intensive delete workload
, due to a really intensive delete workloads, the SSTable is promoted to t.. Is cassandra design for *delete* workloads? doubt so. Perhaps looking at some other alternative like ttl? jason On Mon, May 25, 2015 at 10:12 AM, Manoj Khangaonkar khangaon...@gmail.com wrote: Hi, For a delete intensive workload ( translate to write intensive), is there any reason to use leveled compaction ? The recommendation seems to be that leveled compaction is suited for read intensive workloads. Depending on your use case, you might better of with data tiered or size tiered strategy. regards regards On Sun, May 24, 2015 at 10:50 AM, Stefano Ortolani ostef...@gmail.com wrote: Hi all, I have a question re leveled compaction strategy that has been bugging me quite a lot lately. Based on what I understood, a compaction takes place when the SSTable gets to a specific size (10 times the size of its previous generation). My question is about an edge case where, due to a really intensive delete workloads, the SSTable is promoted to the next level (say L1) and its size, because of the many evicted tombstones, fall back to 1/10 of its size (hence to a size compatible to the previous generation, L0). What happens in this case? If the next major compaction is set to happen when the SSTable is promoted to L2, well, that might take too long and too many tobmstones could then appear in the meanwhile (and queries might subsequently fail). Wouldn't be more correct to flag the SStable's generation to its previous value (namely, not changing it even if a major compaction took place)? Regards, Stefano Ortolani -- http://khangaonkar.blogspot.com/
Re: Leveled Compaction Strategy with a really intensive delete workload
Hi all, Thanks for your answers! Yes, I agree that a delete intensive workload is not something Cassandra is designed for. Unfortunately this is to cope with some unexpected data transformations that I hope are a temporary thing. We chose LCS strategy because of really wide rows which were spanning several SStables with other compaction strategies (and hence leading to high latency read queries). I was honestly thinking of scraping and rebuilding the SStable from scratch if this workload is confirmed to be temporary. Knowing the answer to my question above would help second guess my a decision a bit less :) Cheers, Stefano On Mon, May 25, 2015 at 9:52 AM, Jason Wee peich...@gmail.com wrote: , due to a really intensive delete workloads, the SSTable is promoted to t.. Is cassandra design for *delete* workloads? doubt so. Perhaps looking at some other alternative like ttl? jason On Mon, May 25, 2015 at 10:12 AM, Manoj Khangaonkar khangaon...@gmail.com wrote: Hi, For a delete intensive workload ( translate to write intensive), is there any reason to use leveled compaction ? The recommendation seems to be that leveled compaction is suited for read intensive workloads. Depending on your use case, you might better of with data tiered or size tiered strategy. regards regards On Sun, May 24, 2015 at 10:50 AM, Stefano Ortolani ostef...@gmail.com wrote: Hi all, I have a question re leveled compaction strategy that has been bugging me quite a lot lately. Based on what I understood, a compaction takes place when the SSTable gets to a specific size (10 times the size of its previous generation). My question is about an edge case where, due to a really intensive delete workloads, the SSTable is promoted to the next level (say L1) and its size, because of the many evicted tombstones, fall back to 1/10 of its size (hence to a size compatible to the previous generation, L0). What happens in this case? If the next major compaction is set to happen when the SSTable is promoted to L2, well, that might take too long and too many tobmstones could then appear in the meanwhile (and queries might subsequently fail). Wouldn't be more correct to flag the SStable's generation to its previous value (namely, not changing it even if a major compaction took place)? Regards, Stefano Ortolani -- http://khangaonkar.blogspot.com/
Re: Leveled Compaction Strategy with a really intensive delete workload
Ok, I am reading a bit more about compaction subproperties here ( http://docs.datastax.com/en/cql/3.1/cql/cql_reference/compactSubprop.html) and it seems that tombstone_threshold and unchecked_tombstone_compaction might come handy. Does anybody know if changing any of these values (via ALTER) is possible without downtime, and how fast those values are picked up? Cheers, Stefano On Mon, May 25, 2015 at 1:32 PM, Stefano Ortolani ostef...@gmail.com wrote: Hi all, Thanks for your answers! Yes, I agree that a delete intensive workload is not something Cassandra is designed for. Unfortunately this is to cope with some unexpected data transformations that I hope are a temporary thing. We chose LCS strategy because of really wide rows which were spanning several SStables with other compaction strategies (and hence leading to high latency read queries). I was honestly thinking of scraping and rebuilding the SStable from scratch if this workload is confirmed to be temporary. Knowing the answer to my question above would help second guess my a decision a bit less :) Cheers, Stefano On Mon, May 25, 2015 at 9:52 AM, Jason Wee peich...@gmail.com wrote: , due to a really intensive delete workloads, the SSTable is promoted to t.. Is cassandra design for *delete* workloads? doubt so. Perhaps looking at some other alternative like ttl? jason On Mon, May 25, 2015 at 10:12 AM, Manoj Khangaonkar khangaon...@gmail.com wrote: Hi, For a delete intensive workload ( translate to write intensive), is there any reason to use leveled compaction ? The recommendation seems to be that leveled compaction is suited for read intensive workloads. Depending on your use case, you might better of with data tiered or size tiered strategy. regards regards On Sun, May 24, 2015 at 10:50 AM, Stefano Ortolani ostef...@gmail.com wrote: Hi all, I have a question re leveled compaction strategy that has been bugging me quite a lot lately. Based on what I understood, a compaction takes place when the SSTable gets to a specific size (10 times the size of its previous generation). My question is about an edge case where, due to a really intensive delete workloads, the SSTable is promoted to the next level (say L1) and its size, because of the many evicted tombstones, fall back to 1/10 of its size (hence to a size compatible to the previous generation, L0). What happens in this case? If the next major compaction is set to happen when the SSTable is promoted to L2, well, that might take too long and too many tobmstones could then appear in the meanwhile (and queries might subsequently fail). Wouldn't be more correct to flag the SStable's generation to its previous value (namely, not changing it even if a major compaction took place)? Regards, Stefano Ortolani -- http://khangaonkar.blogspot.com/
Re: Leveled Compaction Strategy with a really intensive delete workload
help second guess my a decision a bit less :) Cheers, Stefano On Mon, May 25, 2015 at 9:52 AM, Jason Wee peich...@gmail.com wrote: , due to a really intensive delete workloads, the SSTable is promoted to t.. Is cassandra design for *delete* workloads? doubt so. Perhaps looking at some other alternative like ttl? jason On Mon, May 25, 2015 at 10:12 AM, Manoj Khangaonkar khangaon...@gmail.com wrote: Hi, For a delete intensive workload ( translate to write intensive), is there any reason to use leveled compaction ? The recommendation seems to be that leveled compaction is suited for read intensive workloads. Depending on your use case, you might better of with data tiered or size tiered strategy. regards regards On Sun, May 24, 2015 at 10:50 AM, Stefano Ortolani ostef...@gmail.com wrote: Hi all, I have a question re leveled compaction strategy that has been bugging me quite a lot lately. Based on what I understood, a compaction takes place when the SSTable gets to a specific size (10 times the size of its previous generation). My question is about an edge case where, due to a really intensive delete workloads, the SSTable is promoted to the next level (say L1) and its size, because of the many evicted tombstones, fall back to 1/10 of its size (hence to a size compatible to the previous generation, L0). What happens in this case? If the next major compaction is set to happen when the SSTable is promoted to L2, well, that might take too long and too many tobmstones could then appear in the meanwhile (and queries might subsequently fail). Wouldn't be more correct to flag the SStable's generation to its previous value (namely, not changing it even if a major compaction took place)? Regards, Stefano Ortolani -- http://khangaonkar.blogspot.com/
Re: Leveled Compaction Strategy with a really intensive delete workload
Hi, For a delete intensive workload ( translate to write intensive), is there any reason to use leveled compaction ? The recommendation seems to be that leveled compaction is suited for read intensive workloads. Depending on your use case, you might better of with data tiered or size tiered strategy. regards regards On Sun, May 24, 2015 at 10:50 AM, Stefano Ortolani ostef...@gmail.com wrote: Hi all, I have a question re leveled compaction strategy that has been bugging me quite a lot lately. Based on what I understood, a compaction takes place when the SSTable gets to a specific size (10 times the size of its previous generation). My question is about an edge case where, due to a really intensive delete workloads, the SSTable is promoted to the next level (say L1) and its size, because of the many evicted tombstones, fall back to 1/10 of its size (hence to a size compatible to the previous generation, L0). What happens in this case? If the next major compaction is set to happen when the SSTable is promoted to L2, well, that might take too long and too many tobmstones could then appear in the meanwhile (and queries might subsequently fail). Wouldn't be more correct to flag the SStable's generation to its previous value (namely, not changing it even if a major compaction took place)? Regards, Stefano Ortolani -- http://khangaonkar.blogspot.com/
Re: Leveled Compaction, number of SStables growing.
We run with 128mb some run with 256mb. Leveled compaction creates fixed sized sstables by design so this is the only way to lower the file count. On Tue, Jul 9, 2013 at 2:56 PM, PARASHAR, BHASKARJYA JAY bp1...@att.comwrote: Hi, ** ** We recently switched from size tired compaction to Leveled compaction. We made this change because our rows are frequently updated. We also have a lot of data. With size-tiered compaction, we have about 5-10 sstables per CF. So with about 15 CF’s we had about 100 sstables. With a sstable default sixe of 5mb, now after leveled compaction, we have about 130k sstables and growing as the writes increases. There are a lot of compaction jobs pending. If we increase the SStable size to 20mb, that will be about 30k sstables but it’s still a lot. ** ** Is this common? Any solution, hints on reducing the sstables are welcome.* *** ** ** Thanks -Jay -- http://twitter.com/tjake
RE: Leveled Compaction, number of SStables growing.
Thanks Jake. Guess we will have to increase the size. From: Jake Luciani [mailto:jak...@gmail.com] Sent: Tuesday, July 09, 2013 2:05 PM To: user Subject: Re: Leveled Compaction, number of SStables growing. We run with 128mb some run with 256mb. Leveled compaction creates fixed sized sstables by design so this is the only way to lower the file count. On Tue, Jul 9, 2013 at 2:56 PM, PARASHAR, BHASKARJYA JAY bp1...@att.commailto:bp1...@att.com wrote: Hi, We recently switched from size tired compaction to Leveled compaction. We made this change because our rows are frequently updated. We also have a lot of data. With size-tiered compaction, we have about 5-10 sstables per CF. So with about 15 CF's we had about 100 sstables. With a sstable default sixe of 5mb, now after leveled compaction, we have about 130k sstables and growing as the writes increases. There are a lot of compaction jobs pending. If we increase the SStable size to 20mb, that will be about 30k sstables but it's still a lot. Is this common? Any solution, hints on reducing the sstables are welcome. Thanks -Jay -- http://twitter.com/tjake
RE: Leveled Compaction, number of SStables growing.
Thanks Sankalp...I will look at these. From: sankalp kohli [mailto:kohlisank...@gmail.com] Sent: Tuesday, July 09, 2013 3:22 PM To: user@cassandra.apache.org Subject: Re: Leveled Compaction, number of SStables growing. Do you have lot of sstables in L0? Since you moved from size tiered compaction with lot of data, it will take time for it to compact. You might want to increase the compaction settings to speed it up. On Tue, Jul 9, 2013 at 12:33 PM, PARASHAR, BHASKARJYA JAY bp1...@att.commailto:bp1...@att.com wrote: Thanks Jake. Guess we will have to increase the size. From: Jake Luciani [mailto:jak...@gmail.commailto:jak...@gmail.com] Sent: Tuesday, July 09, 2013 2:05 PM To: user Subject: Re: Leveled Compaction, number of SStables growing. We run with 128mb some run with 256mb. Leveled compaction creates fixed sized sstables by design so this is the only way to lower the file count. On Tue, Jul 9, 2013 at 2:56 PM, PARASHAR, BHASKARJYA JAY bp1...@att.commailto:bp1...@att.com wrote: Hi, We recently switched from size tired compaction to Leveled compaction. We made this change because our rows are frequently updated. We also have a lot of data. With size-tiered compaction, we have about 5-10 sstables per CF. So with about 15 CF's we had about 100 sstables. With a sstable default sixe of 5mb, now after leveled compaction, we have about 130k sstables and growing as the writes increases. There are a lot of compaction jobs pending. If we increase the SStable size to 20mb, that will be about 30k sstables but it's still a lot. Is this common? Any solution, hints on reducing the sstables are welcome. Thanks -Jay -- http://twitter.com/tjake
Re: Leveled Compaction, number of SStables growing.
Since you moved from size tiered compaction. All your sstables are in L0. You might be hitting this. Copied from code. // LevelDB gives each level a score of how much data it contains vs its ideal amount, and // compacts the level with the highest score. But this falls apart spectacularly once you // get behind. Consider this set of levels: // L0: 988 [ideal: 4] // L1: 117 [ideal: 10] // L2: 12 [ideal: 100] // // The problem is that L0 has a much higher score (almost 250) than L1 (11), so what we'll // do is compact a batch of MAX_COMPACTING_L0 sstables with all 117 L1 sstables, and put the // result (say, 120 sstables) in L1. Then we'll compact the next batch of MAX_COMPACTING_L0, // and so forth. So we spend most of our i/o rewriting the L1 data with each batch. // // If we could just do *all* L0 a single time with L1, that would be ideal. But we can't // -- see the javadoc for MAX_COMPACTING_L0. // // LevelDB's way around this is to simply block writes if L0 compaction falls behind. // We don't have that luxury. // // So instead, we // 1) force compacting higher levels first, which minimizes the i/o needed to compact //optimially which gives us a long term win, and // 2) if L0 falls behind, we will size-tiered compact it to reduce read overhead until //we can catch up on the higher levels. // // This isn't a magic wand -- if you are consistently writing too fast for LCS to keep // up, you're still screwed. But if instead you have intermittent bursts of activity, // it can help a lot. On Tue, Jul 9, 2013 at 3:23 PM, PARASHAR, BHASKARJYA JAY bp1...@att.comwrote: Thanks Sankalp…I will look at these. ** ** *From:* sankalp kohli [mailto:kohlisank...@gmail.com] *Sent:* Tuesday, July 09, 2013 3:22 PM *To:* user@cassandra.apache.org *Subject:* Re: Leveled Compaction, number of SStables growing. ** ** Do you have lot of sstables in L0? Since you moved from size tiered compaction with lot of data, it will take time for it to compact. You might want to increase the compaction settings to speed it up. ** ** On Tue, Jul 9, 2013 at 12:33 PM, PARASHAR, BHASKARJYA JAY bp1...@att.com wrote: Thanks Jake. Guess we will have to increase the size. *From:* Jake Luciani [mailto:jak...@gmail.com] *Sent:* Tuesday, July 09, 2013 2:05 PM *To:* user *Subject:* Re: Leveled Compaction, number of SStables growing. We run with 128mb some run with 256mb. Leveled compaction creates fixed sized sstables by design so this is the only way to lower the file count.* *** On Tue, Jul 9, 2013 at 2:56 PM, PARASHAR, BHASKARJYA JAY bp1...@att.com wrote: Hi, We recently switched from size tired compaction to Leveled compaction. We made this change because our rows are frequently updated. We also have a lot of data. With size-tiered compaction, we have about 5-10 sstables per CF. So with about 15 CF’s we had about 100 sstables. With a sstable default sixe of 5mb, now after leveled compaction, we have about 130k sstables and growing as the writes increases. There are a lot of compaction jobs pending. If we increase the SStable size to 20mb, that will be about 30k sstables but it’s still a lot. Is this common? Any solution, hints on reducing the sstables are welcome.* *** Thanks -Jay -- http://twitter.com/tjake ** ** ** **
Re: leveled compaction
It is SSTable counts in each level. SSTables in each level: [40/4, 442/10, 97, 967, 7691, 0, 0, 0] So you have 40 SSTables in L0, 442 in L1, 97 in L2 and so forth. '40/4' and '442/10' have numbers after slash, those are expected maximum number of SSTables in that level and only displayed when you have more than that threshold. On Friday, March 8, 2013 at 3:24 PM, Kanwar Sangha wrote: Hi – Can someone explain the meaning for the levelled compaction in cfstats – SSTables in each level: [40/4, 442/10, 97, 967, 7691, 0, 0, 0] SSTables in each level: [61/4, 9, 92, 945, 8146, 0, 0, 0] SSTables in each level: [34/4, 1000/10, 100, 953, 8184, 0, 0, 0 Thanks, Kanwar
RE: leveled compaction
Cool ! So of we exceed the threshold, is that an issue… ? From: Yuki Morishita [mailto:mor.y...@gmail.com] Sent: 08 March 2013 15:57 To: user@cassandra.apache.org Subject: Re: leveled compaction It is SSTable counts in each level. SSTables in each level: [40/4, 442/10, 97, 967, 7691, 0, 0, 0] So you have 40 SSTables in L0, 442 in L1, 97 in L2 and so forth. '40/4' and '442/10' have numbers after slash, those are expected maximum number of SSTables in that level and only displayed when you have more than that threshold. On Friday, March 8, 2013 at 3:24 PM, Kanwar Sangha wrote: Hi – Can someone explain the meaning for the levelled compaction in cfstats – SSTables in each level: [40/4, 442/10, 97, 967, 7691, 0, 0, 0] SSTables in each level: [61/4, 9, 92, 945, 8146, 0, 0, 0] SSTables in each level: [34/4, 1000/10, 100, 953, 8184, 0, 0, 0 Thanks, Kanwar
Re: leveled compaction
no. sstables are eventually compacted and moved to next level. On Friday, March 8, 2013, Kanwar Sangha wrote: Cool ! So of we exceed the threshold, is that an issue… ? ** ** *From:* Yuki Morishita [mailto:mor.y...@gmail.com javascript:_e({}, 'cvml', 'mor.y...@gmail.com');] *Sent:* 08 March 2013 15:57 *To:* user@cassandra.apache.org javascript:_e({}, 'cvml', 'user@cassandra.apache.org'); *Subject:* Re: leveled compaction ** ** It is SSTable counts in each level. SSTables in each level: [40/4, 442/10, 97, 967, 7691, 0, 0, 0] So you have 40 SSTables in L0, 442 in L1, 97 in L2 and so forth. '40/4' and '442/10' have numbers after slash, those are expected maximum number of SSTables in that level and only displayed when you have more than that threshold. On Friday, March 8, 2013 at 3:24 PM, Kanwar Sangha wrote: Hi – Can someone explain the meaning for the levelled compaction in cfstats –** ** SSTables in each level: [40/4, 442/10, 97, 967, 7691, 0, 0, 0] SSTables in each level: [61/4, 9, 92, 945, 8146, 0, 0, 0] SSTables in each level: [34/4, 1000/10, 100, 953, 8184, 0, 0, 0 Thanks, Kanwar ** ** -- Yuki Morishita t:yukim (http://twitter.com/yukim)
Re: leveled compaction and tombstoned data
On Sat, Nov 10, 2012 at 7:17 PM, Edward Capriolo edlinuxg...@gmail.comwrote: No it does not exist. Rob and I might start a donation page and give the money to whoever is willing to code it. If someone would write a tool that would split an sstable into 4 smaller sstables (even an offline command line tool) Something like that: https://github.com/pcmanus/cassandra/commits/sstable_split (adds an sstablesplit offline tool) I would paypal them a hundo. Just tell me how you want to proceed :) -- Sylvain On Sat, Nov 10, 2012 at 1:10 PM, Aaron Turner synfina...@gmail.com wrote: Nope. I think at least once a week I hear someone suggest one way to solve their problem is to write an sstablesplit tool. I'm pretty sure that: Step 1. Write sstablesplit Step 2. ??? Step 3. Profit! On Sat, Nov 10, 2012 at 9:40 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: @Rob Coli Does the sstablesplit function exists somewhere ? 2012/11/10 Jim Cistaro jcist...@netflix.com For some of our clusters, we have taken the periodic major compaction route. There are a few things to consider: 1) Once you start major compacting, depending on data size, you may be committed to doing it periodically because you create one big file that will take forever to naturally compact agaist 3 like sized files. 2) If you rely heavily on file cache (rather than large row caches), each major compaction effectively invalidates the entire file cache beause everything is written to one new large file. -- Jim Cistaro On 11/9/12 11:27 AM, Rob Coli rc...@palominodb.com wrote: On Thu, Nov 8, 2012 at 10:12 AM, B. Todd Burruss bto...@gmail.com wrote: my question is would leveled compaction help to get rid of the tombstoned data faster than size tiered, and therefore reduce the disk space usage? You could also... 1) run a major compaction 2) code up sstablesplit 3) profit! This method incurs a management penalty if not automated, but is otherwise the most efficient way to deal with tombstones and obsolete data.. :D =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero
Re: leveled compaction and tombstoned data
I would be careful with the patch that was referred to above, it hasn't been reviewed, and from a glance it appears that it will cause an infinite compaction loop if you get more than 4 SSTables at max size. it will, you need to setup max sstable size correctly.
Re: leveled compaction and tombstoned data
For some of our clusters, we have taken the periodic major compaction route. There are a few things to consider: 1) Once you start major compacting, depending on data size, you may be committed to doing it periodically because you create one big file that will take forever to naturally compact agaist 3 like sized files. 2) If you rely heavily on file cache (rather than large row caches), each major compaction effectively invalidates the entire file cache beause everything is written to one new large file. -- Jim Cistaro On 11/9/12 11:27 AM, Rob Coli rc...@palominodb.com wrote: On Thu, Nov 8, 2012 at 10:12 AM, B. Todd Burruss bto...@gmail.com wrote: my question is would leveled compaction help to get rid of the tombstoned data faster than size tiered, and therefore reduce the disk space usage? You could also... 1) run a major compaction 2) code up sstablesplit 3) profit! This method incurs a management penalty if not automated, but is otherwise the most efficient way to deal with tombstones and obsolete data.. :D =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb
Re: leveled compaction and tombstoned data
@Rob Coli Does the sstablesplit function exists somewhere ? 2012/11/10 Jim Cistaro jcist...@netflix.com For some of our clusters, we have taken the periodic major compaction route. There are a few things to consider: 1) Once you start major compacting, depending on data size, you may be committed to doing it periodically because you create one big file that will take forever to naturally compact agaist 3 like sized files. 2) If you rely heavily on file cache (rather than large row caches), each major compaction effectively invalidates the entire file cache beause everything is written to one new large file. -- Jim Cistaro On 11/9/12 11:27 AM, Rob Coli rc...@palominodb.com wrote: On Thu, Nov 8, 2012 at 10:12 AM, B. Todd Burruss bto...@gmail.com wrote: my question is would leveled compaction help to get rid of the tombstoned data faster than size tiered, and therefore reduce the disk space usage? You could also... 1) run a major compaction 2) code up sstablesplit 3) profit! This method incurs a management penalty if not automated, but is otherwise the most efficient way to deal with tombstones and obsolete data.. :D =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb
Re: leveled compaction and tombstoned data
Nope. I think at least once a week I hear someone suggest one way to solve their problem is to write an sstablesplit tool. I'm pretty sure that: Step 1. Write sstablesplit Step 2. ??? Step 3. Profit! On Sat, Nov 10, 2012 at 9:40 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: @Rob Coli Does the sstablesplit function exists somewhere ? 2012/11/10 Jim Cistaro jcist...@netflix.com For some of our clusters, we have taken the periodic major compaction route. There are a few things to consider: 1) Once you start major compacting, depending on data size, you may be committed to doing it periodically because you create one big file that will take forever to naturally compact agaist 3 like sized files. 2) If you rely heavily on file cache (rather than large row caches), each major compaction effectively invalidates the entire file cache beause everything is written to one new large file. -- Jim Cistaro On 11/9/12 11:27 AM, Rob Coli rc...@palominodb.com wrote: On Thu, Nov 8, 2012 at 10:12 AM, B. Todd Burruss bto...@gmail.com wrote: my question is would leveled compaction help to get rid of the tombstoned data faster than size tiered, and therefore reduce the disk space usage? You could also... 1) run a major compaction 2) code up sstablesplit 3) profit! This method incurs a management penalty if not automated, but is otherwise the most efficient way to deal with tombstones and obsolete data.. :D =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero
Re: leveled compaction and tombstoned data
No it does not exist. Rob and I might start a donation page and give the money to whoever is willing to code it. If someone would write a tool that would split an sstable into 4 smaller sstables (even an offline command line tool) I would paypal them a hundo. On Sat, Nov 10, 2012 at 1:10 PM, Aaron Turner synfina...@gmail.com wrote: Nope. I think at least once a week I hear someone suggest one way to solve their problem is to write an sstablesplit tool. I'm pretty sure that: Step 1. Write sstablesplit Step 2. ??? Step 3. Profit! On Sat, Nov 10, 2012 at 9:40 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: @Rob Coli Does the sstablesplit function exists somewhere ? 2012/11/10 Jim Cistaro jcist...@netflix.com For some of our clusters, we have taken the periodic major compaction route. There are a few things to consider: 1) Once you start major compacting, depending on data size, you may be committed to doing it periodically because you create one big file that will take forever to naturally compact agaist 3 like sized files. 2) If you rely heavily on file cache (rather than large row caches), each major compaction effectively invalidates the entire file cache beause everything is written to one new large file. -- Jim Cistaro On 11/9/12 11:27 AM, Rob Coli rc...@palominodb.com wrote: On Thu, Nov 8, 2012 at 10:12 AM, B. Todd Burruss bto...@gmail.com wrote: my question is would leveled compaction help to get rid of the tombstoned data faster than size tiered, and therefore reduce the disk space usage? You could also... 1) run a major compaction 2) code up sstablesplit 3) profit! This method incurs a management penalty if not automated, but is otherwise the most efficient way to deal with tombstones and obsolete data.. :D =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero
Re: leveled compaction and tombstoned data
On 2012-11-08, at 1:12 PM, B. Todd Burruss bto...@gmail.com wrote: we are having the problem where we have huge SSTABLEs with tombstoned data in them that is not being compacted soon enough (because size tiered compaction requires, by default, 4 like sized SSTABLEs). this is using more disk space than we anticipated. we are very write heavy compared to reads, and we delete the data after N number of days (depends on the column family, but N is around 7 days) my question is would leveled compaction help to get rid of the tombstoned data faster than size tiered, and therefore reduce the disk space usage From my experience, levelled compaction makes space reclamation after deletes even less predictable than sized-tier. The reason is that deletes, like all mutations, are just recorded into sstables. They enter level0, and get slowly, over time, promoted upwards to levelN. Depending on your *total* mutation volume VS your data set size, this may be quite a slow process. This is made even worse if the size of the data you're deleting (say, an entire row worth several hundred kilobytes) is to-be-deleted by a small row-level tombstone. If the row is sitting in level 4, the tombstone won't impact it until enough data has pushed over all existing data in level3, level2, level1, level0 Finally, to guard against the tombstone missing any data, the tombstone itself is not candidate for removal (I believe even after gc_grace has passed) unless it's reached the highest populated level in levelled compaction. This means if you have 4 levels and issue a ton of deletes (even deletes that will never impact existing data), these tombstones are deadweight that cannot be purged until they hit level4. For a write-heavy workload, I recommend you stick with sized-tier. You have several options at your disposal (compaction min/max thresholds, gc_grace) to move things along. If that doesn't help, I've heard of some fairly reputable people doing some fairly blasphemous things (major compactions every night).
Re: leveled compaction and tombstoned data
The rules for tombstone eviction are as follows (regardless of your compaction strategy): 1. gc_grace must be expired, and 2. No other row fragments can exist for the row that aren't also participating in the compaction. For LCS, there is no 'rule' that the tombstones can only be evicted at the highest level. They can be evicted on whichever of the level that the row converges on. Depending on your use case this may mean it always happens at level4, it might also mean that it most often happens at L1, or L2. On Fri, Nov 9, 2012 at 7:31 AM, Mina Naguib mina.nag...@adgear.com wrote: On 2012-11-08, at 1:12 PM, B. Todd Burruss bto...@gmail.com wrote: we are having the problem where we have huge SSTABLEs with tombstoned data in them that is not being compacted soon enough (because size tiered compaction requires, by default, 4 like sized SSTABLEs). this is using more disk space than we anticipated. we are very write heavy compared to reads, and we delete the data after N number of days (depends on the column family, but N is around 7 days) my question is would leveled compaction help to get rid of the tombstoned data faster than size tiered, and therefore reduce the disk space usage From my experience, levelled compaction makes space reclamation after deletes even less predictable than sized-tier. The reason is that deletes, like all mutations, are just recorded into sstables. They enter level0, and get slowly, over time, promoted upwards to levelN. Depending on your *total* mutation volume VS your data set size, this may be quite a slow process. This is made even worse if the size of the data you're deleting (say, an entire row worth several hundred kilobytes) is to-be-deleted by a small row-level tombstone. If the row is sitting in level 4, the tombstone won't impact it until enough data has pushed over all existing data in level3, level2, level1, level0 Finally, to guard against the tombstone missing any data, the tombstone itself is not candidate for removal (I believe even after gc_grace has passed) unless it's reached the highest populated level in levelled compaction. This means if you have 4 levels and issue a ton of deletes (even deletes that will never impact existing data), these tombstones are deadweight that cannot be purged until they hit level4. For a write-heavy workload, I recommend you stick with sized-tier. You have several options at your disposal (compaction min/max thresholds, gc_grace) to move things along. If that doesn't help, I've heard of some fairly reputable people doing some fairly blasphemous things (major compactions every night). -- Ben Coverston DataStax -- The Apache Cassandra Company
Re: leveled compaction and tombstoned data
On Thu, Nov 8, 2012 at 10:12 AM, B. Todd Burruss bto...@gmail.com wrote: my question is would leveled compaction help to get rid of the tombstoned data faster than size tiered, and therefore reduce the disk space usage? You could also... 1) run a major compaction 2) code up sstablesplit 3) profit! This method incurs a management penalty if not automated, but is otherwise the most efficient way to deal with tombstones and obsolete data.. :D =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb
Re: leveled compaction and tombstoned data
Dne 8.11.2012 19:12, B. Todd Burruss napsal(a): my question is would leveled compaction help to get rid of the tombstoned data faster than size tiered, and therefore reduce the disk space usage? leveled compaction will kill your performance. get patch from jira for maximum sstable size per CF and force cassandra to make smaller tables, they expire faster.
Re: leveled compaction and tombstoned data
we are running Datastax enterprise and cannot patch it. how bad is kill performance? if it is so bad, why is it an option? On Thu, Nov 8, 2012 at 10:17 AM, Radim Kolar h...@filez.com wrote: Dne 8.11.2012 19:12, B. Todd Burruss napsal(a): my question is would leveled compaction help to get rid of the tombstoned data faster than size tiered, and therefore reduce the disk space usage? leveled compaction will kill your performance. get patch from jira for maximum sstable size per CF and force cassandra to make smaller tables, they expire faster.
Re: leveled compaction and tombstoned data
kill performance is relative. Leveled Compaction basically costs 2x disk IO. Look at iostat, etc and see if you have the headroom. There are also ways to bring up a test node and just run Level Compaction on that. Wish I had a URL handy, but hopefully someone else can find it. Also, if you're not using compression, check it out. On Thu, Nov 8, 2012 at 11:20 AM, B. Todd Burruss bto...@gmail.com wrote: we are running Datastax enterprise and cannot patch it. how bad is kill performance? if it is so bad, why is it an option? On Thu, Nov 8, 2012 at 10:17 AM, Radim Kolar h...@filez.com wrote: Dne 8.11.2012 19:12, B. Todd Burruss napsal(a): my question is would leveled compaction help to get rid of the tombstoned data faster than size tiered, and therefore reduce the disk space usage? leveled compaction will kill your performance. get patch from jira for maximum sstable size per CF and force cassandra to make smaller tables, they expire faster. -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero
Re: leveled compaction and tombstoned data
LCS works well in specific circumstances, this blog post gives some good considerations: http://www.datastax.com/dev/blog/when-to-use-leveled-compaction On Nov 8, 2012, at 1:33 PM, Aaron Turner synfina...@gmail.com wrote: kill performance is relative. Leveled Compaction basically costs 2x disk IO. Look at iostat, etc and see if you have the headroom. There are also ways to bring up a test node and just run Level Compaction on that. Wish I had a URL handy, but hopefully someone else can find it. Also, if you're not using compression, check it out. On Thu, Nov 8, 2012 at 11:20 AM, B. Todd Burruss bto...@gmail.com wrote: we are running Datastax enterprise and cannot patch it. how bad is kill performance? if it is so bad, why is it an option? On Thu, Nov 8, 2012 at 10:17 AM, Radim Kolar h...@filez.com wrote: Dne 8.11.2012 19:12, B. Todd Burruss napsal(a): my question is would leveled compaction help to get rid of the tombstoned data faster than size tiered, and therefore reduce the disk space usage? leveled compaction will kill your performance. get patch from jira for maximum sstable size per CF and force cassandra to make smaller tables, they expire faster. -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero
Re: leveled compaction and tombstoned data
On Thu, Nov 8, 2012 at 1:33 PM, Aaron Turner synfina...@gmail.com wrote: There are also ways to bring up a test node and just run Level Compaction on that. Wish I had a URL handy, but hopefully someone else can find it. This rather handsome fellow wrote a blog about it: http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-live-traffic-sampling -Brandon
Re: leveled compaction and tombstoned data
http://www.datastax.com/docs/1.1/operations/tuning#testing-compaction-and-compression Write Survey mode. After you have it up and running you can modify the column family mbean to use LeveledCompactionStrategy on that node to see how your hardware/load fares with LCS. On Thu, Nov 8, 2012 at 11:33 AM, Aaron Turner synfina...@gmail.com wrote: kill performance is relative. Leveled Compaction basically costs 2x disk IO. Look at iostat, etc and see if you have the headroom. There are also ways to bring up a test node and just run Level Compaction on that. Wish I had a URL handy, but hopefully someone else can find it. Also, if you're not using compression, check it out. On Thu, Nov 8, 2012 at 11:20 AM, B. Todd Burruss bto...@gmail.com wrote: we are running Datastax enterprise and cannot patch it. how bad is kill performance? if it is so bad, why is it an option? On Thu, Nov 8, 2012 at 10:17 AM, Radim Kolar h...@filez.com wrote: Dne 8.11.2012 19:12, B. Todd Burruss napsal(a): my question is would leveled compaction help to get rid of the tombstoned data faster than size tiered, and therefore reduce the disk space usage? leveled compaction will kill your performance. get patch from jira for maximum sstable size per CF and force cassandra to make smaller tables, they expire faster. -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero -- Ben Coverston DataStax -- The Apache Cassandra Company
Re: leveled compaction and tombstoned data
Also to answer your question, LCS is well suited to workloads where overwrites and tombstones come into play. The tombstones are _much_ more likely to be merged with LCS than STCS. I would be careful with the patch that was referred to above, it hasn't been reviewed, and from a glance it appears that it will cause an infinite compaction loop if you get more than 4 SSTables at max size. On Thu, Nov 8, 2012 at 11:41 AM, Brandon Williams dri...@gmail.com wrote: On Thu, Nov 8, 2012 at 1:33 PM, Aaron Turner synfina...@gmail.com wrote: There are also ways to bring up a test node and just run Level Compaction on that. Wish I had a URL handy, but hopefully someone else can find it. This rather handsome fellow wrote a blog about it: http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-live-traffic-sampling -Brandon -- Ben Coverston DataStax -- The Apache Cassandra Company
Re: leveled compaction and tombstoned data
thanks for the links! i had forgotten about live sampling On Thu, Nov 8, 2012 at 11:41 AM, Brandon Williams dri...@gmail.com wrote: On Thu, Nov 8, 2012 at 1:33 PM, Aaron Turner synfina...@gmail.com wrote: There are also ways to bring up a test node and just run Level Compaction on that. Wish I had a URL handy, but hopefully someone else can find it. This rather handsome fellow wrote a blog about it: http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-live-traffic-sampling -Brandon
Re: leveled compaction and tombstoned data
@ben, thx, we will be deploying 2.2.1 of DSE soon and will try to setup a traffic sampling node so we can test leveled compaction. we essentially keep a rolling window of data written once. it is written, then after N days it is deleted, so it seems that leveled compaction should help On Thu, Nov 8, 2012 at 11:53 AM, B. Todd Burruss bto...@gmail.com wrote: thanks for the links! i had forgotten about live sampling On Thu, Nov 8, 2012 at 11:41 AM, Brandon Williams dri...@gmail.com wrote: On Thu, Nov 8, 2012 at 1:33 PM, Aaron Turner synfina...@gmail.com wrote: There are also ways to bring up a test node and just run Level Compaction on that. Wish I had a URL handy, but hopefully someone else can find it. This rather handsome fellow wrote a blog about it: http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-live-traffic-sampling -Brandon
Re: leveled compaction - improve log message
If you would like to see a change create a request for an improvement here https://issues.apache.org/jira/browse/CASSANDRA Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 6/04/2012, at 12:51 PM, Radim Kolar wrote: it would be really helpfull if leveled compaction prints level into syslog. Demo: INFO [CompactionExecutor:891] 2012-04-05 22:39:27,043 CompactionTask.java (line 113) Compacting ***LEVEL 1*** [SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19690-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19688-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19691-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19700-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19686-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19696-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19687-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19695-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19689-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19694-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19693-Data.db')] INFO [CompactionExecutor:891] 2012-04-05 22:39:57,299 CompactionTask.java (line 221) *** LEVEL 1 *** Compacted to [/var/lib/cassandra/data/rapidshare/querycache-hc-19701-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19702-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19703-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19704-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19705-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19706-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19707-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19708-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19709-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19710-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19711-Data.db,]. 59,643,011 to 57,564,216 (~96% of original) bytes for 590,909 keys at 1.814434MB/s. Time: 30,256ms.
Re: leveled compaction - improve log message
CompactionExecutor doesn't have level information available to it; it just compacts the sstables it's told to. But if you enable debug logging on LeveledManifest you'd see what you want. (Compaction candidates for L{} are {}) 2012/4/5 Radim Kolar h...@filez.com: it would be really helpfull if leveled compaction prints level into syslog. Demo: INFO [CompactionExecutor:891] 2012-04-05 22:39:27,043 CompactionTask.java (line 113) Compacting ***LEVEL 1*** [SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19690-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19688-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19691-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19700-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19686-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19696-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19687-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19695-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19689-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19694-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19693-Data.db')] INFO [CompactionExecutor:891] 2012-04-05 22:39:57,299 CompactionTask.java (line 221) *** LEVEL 1 *** Compacted to [/var/lib/cassandra/data/rapidshare/querycache-hc-19701-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19702-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19703-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19704-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19705-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19706-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19707-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19708-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19709-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19710-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19711-Data.db,]. 59,643,011 to 57,564,216 (~96% of original) bytes for 590,909 keys at 1.814434MB/s. Time: 30,256ms. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: leveled compaction - improve log message
for details, open conf/log4j-server.properties and add following configuration: log4j.logger.org.apache.cassandra.db.compaction.LeveledManifest=DEBUG fyi. maki 2012/4/10 Jonathan Ellis jbel...@gmail.com: CompactionExecutor doesn't have level information available to it; it just compacts the sstables it's told to. But if you enable debug logging on LeveledManifest you'd see what you want. (Compaction candidates for L{} are {}) 2012/4/5 Radim Kolar h...@filez.com: it would be really helpfull if leveled compaction prints level into syslog. Demo: INFO [CompactionExecutor:891] 2012-04-05 22:39:27,043 CompactionTask.java (line 113) Compacting ***LEVEL 1*** [SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19690-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19688-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19691-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19700-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19686-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19696-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19687-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19695-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19689-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19694-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19693-Data.db')] INFO [CompactionExecutor:891] 2012-04-05 22:39:57,299 CompactionTask.java (line 221) *** LEVEL 1 *** Compacted to [/var/lib/cassandra/data/rapidshare/querycache-hc-19701-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19702-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19703-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19704-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19705-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19706-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19707-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19708-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19709-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19710-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19711-Data.db,]. 59,643,011 to 57,564,216 (~96% of original) bytes for 590,909 keys at 1.814434MB/s. Time: 30,256ms. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Leveled Compaction Strategy; Expected number of files over time?
It looks like what you're seeing is, stress far outpaced the ability of compaction to keep up (which is normal for our default settings, which prioritize maintaining request throughput over compaction), so LCS will grab a bunch of L0 sstables, compact them together with L1 resulting in a spike of L1 sstables, then compact those upwards into higher levels, gradually lowering the sstable count. It's unclear how to improve the LCS can't keep up case [1]. But it's worth noting that a single large stress insert run, consisting as it does of a large volume of unique rows, is the worst case for LCS. This is the primary reason LCS is not the default: if you have an append-mostly write load with few overwrites or deletes, LCS will do a lot of extra i/o for no real benefit. [1] https://issues.apache.org/jira/browse/CASSANDRA-3854 On Sun, Jan 22, 2012 at 10:26 PM, Chris Burroughs chris.burrou...@gmail.com wrote: I inserted a large number of keys to a single node using stress.java [1] and let things sit for a while (several hours with no more inserts). After a bit I decided something might be up and started sampling the number of files in the data directory for 250 minutes while I played The Legend of Zelda. At the start there were 78291 files, and the end 78599. All I see in the log is a lot of Compacting to and Compacted messages. The output of compactionstatus also seemed odd: $ ./bin/nodetool -h localhost -p 10101 compactionstats pending tasks: 3177 compaction type keyspace column family bytes compacted bytes total progress Compaction Keyspace1 Standard1 250298718 0 n/a Below is a graph showing an oscillation in the number of files. Is this how leveled compaction strategy is expected to behave? If so, is it ever 'done'? http://img836.imageshack.us/img836/7294/levelcompactionfiles.png [1] (ran three times) ./bin/stress -d HOST --random -l 1 -o insert -c 25 -e ONE --average-size-values -C 100 -t 75 -n 7500 with this config (dupliate options in original, but I don't think that should matter) update column family Standard1 with rows_cached=100 and keys_cached=0 and compaction_strategy = 'LeveledCompactionStrategy' and compaction_strategy_options = {sstable_size_in_mb:10} and compaction_strategy_options = {sstable_size_in_mb:10} and compression_options={sstable_compression:SnappyCompressor, chunk_length_kb:64} and row_cache_provider = 'ConcurrentLinkedHashCacheProvider' and row_cache_keys_to_save = 2 and row_cache_save_period = 120; -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Leveled Compaction in cassandra1.0 may not be perfect
On Fri, Dec 2, 2011 at 8:13 PM, liangfeng liangf...@made-in-china.com wrote: 1.There is no implementation in cassandra1.0 to ensure the conclusion Only enough space for 10x the sstable size needs to be reserved for temporary use by compaction,so one special compaction may need big free disk space all the same. 2.Leveled compaction *will* do too much i/o,especially when we use RandomPartitioner(because md5 token will cause overlap of many sstables all the time). These two points may cause uncontrollablity when compaction occurs. We specifically create non-overlapping, fixed-size sstables so that there is no such thing as a special or unusually large compaction. (All sstables are sorted by row key in token order, so partitioner choice has no effect here.) You should look at the org.apache.cassandra.db.compaction package and read the original leveldb implementation notes at http://leveldb.googlecode.com/svn/trunk/doc/impl.html for more details. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Leveled Compaction in cassandra1.0 may not be perfect
Jonathan Ellis jbellis at gmail.com writes: You should look at the org.apache.cassandra.db.compaction package and read the original leveldb implementation notes at http://leveldb.googlecode.com/svn/trunk/doc/impl.html for more details. There is an important rule in http://leveldb.googlecode.com/svn/trunk/doc/impl.html: We also switch to a new output file when the key range of the current output file has grown enough to overlap more then ten level-(L+2) files. This last rule ensures that a later compaction of a level-(L+1) file will not pick up too much data from level-(L+2). But in LeveledCompactionTask,we switch to a new output file just when the sstable reach to the fixed-size.So I don't think cassandra1.0 can avoid unusually large compaction.
Re: Leveled Compaction in cassandra1.0 may not be perfect
I think you're confusing temporary space used during a compaction operation with total i/o done by compaction. Leveled compaction *will* do more i/o than size-tiered, because it's enforcing tighter guarantees on how compacted the data is. On Fri, Dec 2, 2011 at 1:01 AM, liangfeng liangf...@made-in-china.com wrote: Hello,everyone! In this doc(http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra),I found a conclusion Only enough space for 10x the sstable size needs to be reserved for temporary use by compaction.I don't know how can we got this conclusion,but I guess the author of this doc may got this conclusion by one compaction rule from levelDB.In one doc of levelDB(http://leveldb.googlecode.com/svn/trunk/doc/impl.html),I found this rule We also switch to a new output file when the key range of the current output file has grown enough to overlap more then ten level-(L+2) files.As this compaction rule descripting,10x the sstable size is enough for every compaction.Unfortunatly,cassandra1.0 may not implement this compaction rule,so I think this conclusion may be arbitrary.Everyone,what do you think about it? Of course,implement this compaction rule may not be hard,but this implementation may cause another problem.Many small sstables which overlap just 10 sstables with next level may be generated in compaction,especially when we use RandomPartitioner.This may cause many compactions when these small sstables have to uplevel to next level.In my practice,I write 120g data to one cassandra node,and cassandra node spent 24 hourse to compacte this data by Leveled Compaction.So,I don't think Leveled Compaction is perfect.What do you think about it,my friends? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Leveled Compaction in cassandra1.0 may not be perfect
Jonathan Ellis jbellis at gmail.com writes: I think you're confusing temporary space used during a compaction operation with total i/o done by compaction. Leveled compaction *will* do more i/o than size-tiered, because it's enforcing tighter guarantees on how compacted the data is. Yes.In fact ,I want to put two points in this topic. 1.There is no implementation in cassandra1.0 to ensure the conclusion Only enough space for 10x the sstable size needs to be reserved for temporary use by compaction,so one special compaction may need big free disk space all the same. 2.Leveled compaction *will* do too much i/o,especially when we use RandomPartitioner(because md5 token will cause overlap of many sstables all the time). These two points may cause uncontrollablity when compaction occurs. Thanks!