Re: size tiered compaction - improvement
Dne 18.4.2012 16:22, Jonathan Ellis napsal(a): It's not that simple, unless you have an append-only workload. I have append only workload and probably most ppl using TTL too.
Re: size tiered compaction - improvement
Any compaction pass over A will first convert the TTL data into tombstones. Then, any subsequent pass that includes A *and all other sstables containing rows with the same key* will drop the tombstones. thats why i proposed to attach TTL to entire CF. Tombstones would not be needed
RE: size tiered compaction - improvement
Our use case requires Column TTL, not CF TTL, because it is variable, not constant. Best regards/ Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063 Fax: +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.-Original Message- From: Radim Kolar [mailto:h...@filez.com] Sent: Wednesday, April 18, 2012 12:57 To: user@cassandra.apache.org Subject: Re: size tiered compaction - improvement Any compaction pass over A will first convert the TTL data into tombstones. Then, any subsequent pass that includes A *and all other sstables containing rows with the same key* will drop the tombstones. thats why i proposed to attach TTL to entire CF. Tombstones would not be needed
Re: size tiered compaction - improvement
For my use case it would be nice to have per CF TTL (to protect myself from application bug and from storage leak due to missed TTL), but seems you can't avoid tombstones even in this case and if you change CF TTL during runtime. On 04/18/2012 03:06 PM, Viktor Jevdokimov wrote: Our use case requires Column TTL, not CF TTL, because it is variable, not constant. Best regards/ Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063 Fax: +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.-Original Message- From: Radim Kolar [mailto:h...@filez.com] Sent: Wednesday, April 18, 2012 12:57 To: user@cassandra.apache.org Subject: Re: size tiered compaction - improvement Any compaction pass over A will first convert the TTL data into tombstones. Then, any subsequent pass that includes A *and all other sstables containing rows with the same key* will drop the tombstones. thats why i proposed to attach TTL to entire CF. Tombstones would not be needed
Re: size tiered compaction - improvement
It's not that simple, unless you have an append-only workload. (See discussion on https://issues.apache.org/jira/browse/CASSANDRA-3974.) On Wed, Apr 18, 2012 at 4:57 AM, Radim Kolar h...@filez.com wrote: Any compaction pass over A will first convert the TTL data into tombstones. Then, any subsequent pass that includes A *and all other sstables containing rows with the same key* will drop the tombstones. thats why i proposed to attach TTL to entire CF. Tombstones would not be needed -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
RE: size tiered compaction - improvement
Per CF or per Row TTL would be very usefull for me also with our timeseries data. -Original Message- From: Igor [mailto:i...@4friends.od.ua] Sent: Wednesday, April 18, 2012 6:06 AM To: user@cassandra.apache.org Subject: Re: size tiered compaction - improvement For my use case it would be nice to have per CF TTL (to protect myself from application bug and from storage leak due to missed TTL), but seems you can't avoid tombstones even in this case and if you change CF TTL during runtime. On 04/18/2012 03:06 PM, Viktor Jevdokimov wrote: Our use case requires Column TTL, not CF TTL, because it is variable, not constant. Best regards/ Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063 Fax: +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.-Original Message- From: Radim Kolar [mailto:h...@filez.com] Sent: Wednesday, April 18, 2012 12:57 To: user@cassandra.apache.org Subject: Re: size tiered compaction - improvement Any compaction pass over A will first convert the TTL data into tombstones. Then, any subsequent pass that includes A *and all other sstables containing rows with the same key* will drop the tombstones. thats why i proposed to attach TTL to entire CF. Tombstones would not be needed
Re: size tiered compaction - improvement
On Sat, Apr 14, 2012 at 3:27 AM, Radim Kolar h...@filez.com wrote: forceUserDefinedCompaction would be more usefull if you could do compaction on 2 tables. You absolutely can. That's what the user defined part is: you give it the exact list of sstables you want compacted. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: size tiered compaction - improvement
On Sat, Apr 14, 2012 at 4:08 AM, Igor i...@4friends.od.ua wrote: Assume I insert all my data with TTL=2weeks and let we have sstable A which was created week ago at the time T, so I know that right now it contain: 1) some data that were inserted not later than T and may-be not expired yet 2) some amount of data that were already close to expiration due TTL at the time T, but still had no chances to be wiped out because up to the current moment size-tiered compaction did not involve A into compactions. Large amount of data from 2) became expired in a week after time T and probably passed gc_grace period, so it shoould be wiped at any compaction on table A. Any compaction pass over A will first convert the TTL data into tombstones. Then, any subsequent pass that includes A *and all other sstables containing rows with the same key* will drop the tombstones. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: size tiered compaction - improvement
Thank you Jonatathan, I missed this point about converting TTL data to tombstones first. When you say: You absolutely can. That's what the user defined part is: you give it the exact list of sstables you want compacted. does it mean that I can use list (not just one) of sstables as second parameter for userDefinedCompaction? On 04/18/2012 05:53 AM, Jonathan Ellis wrote: On Sat, Apr 14, 2012 at 4:08 AM, Igori...@4friends.od.ua wrote: Assume I insert all my data with TTL=2weeks and let we have sstable A which was created week ago at the time T, so I know that right now it contain: 1) some data that were inserted not later than T and may-be not expired yet 2) some amount of data that were already close to expiration due TTL at the time T, but still had no chances to be wiped out because up to the current moment size-tiered compaction did not involve A into compactions. Large amount of data from 2) became expired in a week after time T and probably passed gc_grace period, so it shoould be wiped at any compaction on table A. Any compaction pass over A will first convert the TTL data into tombstones. Then, any subsequent pass that includes A *and all other sstables containing rows with the same key* will drop the tombstones.
Re: size tiered compaction - improvement
On Tue, Apr 17, 2012 at 11:26 PM, Igor i...@4friends.od.ua wrote: You absolutely can. That's what the user defined part is: you give it the exact list of sstables you want compacted. does it mean that I can use list (not just one) of sstables as second parameter for userDefinedCompaction? If you want them all compacted together into one big sstable, yes. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: size tiered compaction - improvement
Dne 4.4.2012 6:52, Igor napsal(a): Here is small python script I run once per day. You have to adjust size and/or age limits in the 'if' operator. Also I use mx4j interface for jmx calls. forceUserDefinedCompaction would be more usefull if you could do compaction on 2 tables. If i run it on single table, it dont shrinks and it does not solve my problem - having sstables at size which will be never compacted because no other sstable of similar size will be created.
Re: size tiered compaction - improvement
I'll try to explain in more details: Assume I insert all my data with TTL=2weeks and let we have sstable A which was created week ago at the time T, so I know that right now it contain: 1) some data that were inserted not later than T and may-be not expired yet 2) some amount of data that were already close to expiration due TTL at the time T, but still had no chances to be wiped out because up to the current moment size-tiered compaction did not involve A into compactions. Large amount of data from 2) became expired in a week after time T and probably passed gc_grace period, so it shoould be wiped at any compaction on table A. Or I missed something? On 04/14/2012 11:27 AM, Radim Kolar wrote: Dne 4.4.2012 6:52, Igor napsal(a): Here is small python script I run once per day. You have to adjust size and/or age limits in the 'if' operator. Also I use mx4j interface for jmx calls. forceUserDefinedCompaction would be more usefull if you could do compaction on 2 tables. If i run it on single table, it dont shrinks and it does not solve my problem - having sstables at size which will be never compacted because no other sstable of similar size will be created.
size tiered compaction - improvement
there is problem with size tiered compaction design. It compacts together tables of similar size. sometimes it might happen that you will have some sstables sitting on disk forever (Feb 23) because no other similar sized tables were created and probably never be. because flushed sstable is about 11-16 mb. next level about 90 MB then 5x 90 MB gets compacted to 400 MB sstable and 5x400 MB ~ 2 GB problem is that 400 MB sstable is too small to be compacted against these 3x 720 MB ones. -rw-r--r-- 1 root wheel 165M Feb 23 17:03 resultcache-hc-13086-Data.db -rw-r--r-- 1 root wheel 772M Feb 23 17:04 resultcache-hc-13087-Data.db -rw-r--r-- 1 root wheel 156M Feb 23 17:06 resultcache-hc-13091-Data.db -rw-r--r-- 1 root wheel 716M Feb 23 17:18 resultcache-hc-13096-Data.db -rw-r--r-- 1 root wheel 734M Feb 23 17:29 resultcache-hc-13101-Data.db -rw-r--r-- 1 root wheel 5.0G Mar 14 09:38 resultcache-hc-13923-Data.db -rw-r--r-- 1 root wheel 1.9G Mar 16 22:41 resultcache-hc-14084-Data.db -rw-r--r-- 1 root wheel 1.9G Mar 21 15:11 resultcache-hc-14460-Data.db -rw-r--r-- 1 root wheel 1.9G Mar 27 05:22 resultcache-hc-14694-Data.db -rw-r--r-- 1 root wheel 2.0G Mar 31 04:57 resultcache-hc-14851-Data.db -rw-r--r-- 1 root wheel 112M Mar 31 06:30 resultcache-hc-14922-Data.db -rw-r--r-- 1 root wheel 577M Apr 1 19:25 resultcache-hc-14943-Data.db compaction strategy needs to compact sstables by timestamp too. older tables should have increased chance to get compacted. for example - table from today will be compacted with other table in range (0.5-1.5) of its size, and this range will get increased with sstable age. - 1 month old will have range for example (0.2 - 1.8).
Re: size tiered compaction - improvement
if you know for sure that you will free lot of space compacting some old table, then you can call UserdefinedCompaction for this table(you can do this from cron). There is also a ticket in jira with discussion on per-sstable expierd column and tombstones counters. -Original Message- From: Radim Kolar h...@filez.com To: user@cassandra.apache.org Sent: Tue, 03 Apr 2012 22:53 Subject: size tiered compaction - improvement there is problem with size tiered compaction design. It compacts together tables of similar size. sometimes it might happen that you will have some sstables sitting on disk forever (Feb 23) because no other similar sized tables were created and probably never be. because flushed sstable is about 11-16 mb. next level about 90 MB then 5x 90 MB gets compacted to 400 MB sstable and 5x400 MB ~ 2 GB problem is that 400 MB sstable is too small to be compacted against these 3x 720 MB ones. -rw-r--r-- 1 root wheel 165M Feb 23 17:03 resultcache-hc-13086-Data.db -rw-r--r-- 1 root wheel 772M Feb 23 17:04 resultcache-hc-13087-Data.db -rw-r--r-- 1 root wheel 156M Feb 23 17:06 resultcache-hc-13091-Data.db -rw-r--r-- 1 root wheel 716M Feb 23 17:18 resultcache-hc-13096-Data.db -rw-r--r-- 1 root wheel 734M Feb 23 17:29 resultcache-hc-13101-Data.db -rw-r--r-- 1 root wheel 5.0G Mar 14 09:38 resultcache-hc-13923-Data.db -rw-r--r-- 1 root wheel 1.9G Mar 16 22:41 resultcache-hc-14084-Data.db -rw-r--r-- 1 root wheel 1.9G Mar 21 15:11 resultcache-hc-14460-Data.db -rw-r--r-- 1 root wheel 1.9G Mar 27 05:22 resultcache-hc-14694-Data.db -rw-r--r-- 1 root wheel 2.0G Mar 31 04:57 resultcache-hc-14851-Data.db -rw-r--r-- 1 root wheel 112M Mar 31 06:30 resultcache-hc-14922-Data.db -rw-r--r-- 1 root wheel 577M Apr 1 19:25 resultcache-hc-14943-Data.db compaction strategy needs to compact sstables by timestamp too. older tables should have increased chance to get compacted. for example - table from today will be compacted with other table in range (0.5-1.5) of its size, and this range will get increased with sstable age. - 1 month old will have range for example (0.2 - 1.8).
Re: size tiered compaction - improvement
Twitter tried a timestamp-based compaction strategy in https://issues.apache.org/jira/browse/CASSANDRA-2735. The conclusion was, this actually resulted in a lot more compactions than the SizeTieredCompactionStrategy. The increase in IO was not acceptable for our use and therefore stopped working on this patch. 2012/4/3 Radim Kolar h...@filez.com: there is problem with size tiered compaction design. It compacts together tables of similar size. sometimes it might happen that you will have some sstables sitting on disk forever (Feb 23) because no other similar sized tables were created and probably never be. because flushed sstable is about 11-16 mb. next level about 90 MB then 5x 90 MB gets compacted to 400 MB sstable and 5x400 MB ~ 2 GB problem is that 400 MB sstable is too small to be compacted against these 3x 720 MB ones. -rw-r--r-- 1 root wheel 165M Feb 23 17:03 resultcache-hc-13086-Data.db -rw-r--r-- 1 root wheel 772M Feb 23 17:04 resultcache-hc-13087-Data.db -rw-r--r-- 1 root wheel 156M Feb 23 17:06 resultcache-hc-13091-Data.db -rw-r--r-- 1 root wheel 716M Feb 23 17:18 resultcache-hc-13096-Data.db -rw-r--r-- 1 root wheel 734M Feb 23 17:29 resultcache-hc-13101-Data.db -rw-r--r-- 1 root wheel 5.0G Mar 14 09:38 resultcache-hc-13923-Data.db -rw-r--r-- 1 root wheel 1.9G Mar 16 22:41 resultcache-hc-14084-Data.db -rw-r--r-- 1 root wheel 1.9G Mar 21 15:11 resultcache-hc-14460-Data.db -rw-r--r-- 1 root wheel 1.9G Mar 27 05:22 resultcache-hc-14694-Data.db -rw-r--r-- 1 root wheel 2.0G Mar 31 04:57 resultcache-hc-14851-Data.db -rw-r--r-- 1 root wheel 112M Mar 31 06:30 resultcache-hc-14922-Data.db -rw-r--r-- 1 root wheel 577M Apr 1 19:25 resultcache-hc-14943-Data.db compaction strategy needs to compact sstables by timestamp too. older tables should have increased chance to get compacted. for example - table from today will be compacted with other table in range (0.5-1.5) of its size, and this range will get increased with sstable age. - 1 month old will have range for example (0.2 - 1.8). -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: size tiered compaction - improvement
Dne 3.4.2012 23:04, i...@4friends.od.ua napsal(a): if you know for sure that you will free lot of space compacting some old table, then you can call UserdefinedCompaction for this table(you can do this from cron). There is also a ticket in jira with discussion on per-sstable expierd column and tombstones counters. you are talking about CompactionManager,forceUserDefinedCompaction mbean? it takes 2 argumenents, no description on them. i never got this work. NoSuchElementException returned
Re: size tiered compaction - improvement
The first is keyspace name, second is sstable name (like transaction-hc-1024-Data.db -Original Message- From: Radim Kolar h...@filez.com To: user@cassandra.apache.org Sent: Wed, 04 Apr 2012 3:14 Subject: Re: size tiered compaction - improvement Dne 3.4.2012 23:04, i...@4friends.od.ua napsal(a): if you know for sure that you will free lot of space compacting some old table, then you can call UserdefinedCompaction for this table(you can do this from cron). There is also a ticket in jira with discussion on per-sstable expierd column and tombstones counters. you are talking about CompactionManager,forceUserDefinedCompaction mbean? it takes 2 argumenents, no description on them. i never got this work. NoSuchElementException returned
Re: size tiered compaction - improvement
Here is small python script I run once per day. You have to adjust size and/or age limits in the 'if' operator. Also I use mx4j interface for jmx calls. #!/usr/bin/env python import sys,os,glob,time,urllib2 CASSANDRA_DATA='/spool1/cassandra/data' DONTTOUCH=('system',) now = time.time() def main(): kss=[ks for ks in os.listdir(CASSANDRA_DATA) if ks not in DONTTOUCH] for ks in kss: sstables=[sst for sst in glob.glob(CASSANDRA_DATA+'/'+ks+'/'+'*-Data.db') if sst.find('-tmp-')==-1] for table in sstables: st = os.stat(table) age=(now-st.st_mtime)/24/3600 size=st.st_size/1024/1024/1024 if (age = 5 and size = 5) or age = 10: table_name = table.split('/')[-1] print compacting , ks, table_name url='http://localhost:8081/invoke?operation=forceUserDefinedCompactionobjectname=org.apache.cassandra.db%%3Atype%%3DCompactionManagervalue0=%stype0=java.lang.Stringvalue1=%stype1=java.lang.String'%(ks, table_name) r=urllib2.urlopen(url) time.sleep(1) if __name__=='__main__': main() On 04/04/2012 07:47 AM, i...@4friends.od.ua wrote: The first is keyspace name, second is sstable name (like transaction-hc-1024-Data.db -Original Message- From: Radim Kolar h...@filez.com To: user@cassandra.apache.org Sent: Wed, 04 Apr 2012 3:14 Subject: Re: size tiered compaction - improvement Dne 3.4.2012 23 tel:34201223:04, i...@4friends.od.ua mailto:i...@4friends.od.ua napsal(a): if you know for sure that you will free lot of space compacting some old table, then you can call UserdefinedCompaction for this table(you can do this from cron). There is also a ticket in jira with discussion on per-sstable expierd column and tombstones counters. you are talking about CompactionManager,forceUserDefinedCompaction mbean? it takes 2 argumenents, no description on them. i never got this work. NoSuchElementException returned