Re: size tiered compaction - improvement

2012-04-25 Thread Radim Kolar

Dne 18.4.2012 16:22, Jonathan Ellis napsal(a):

It's not that simple, unless you have an append-only workload.

I have append only workload and probably most ppl using TTL too.


Re: size tiered compaction - improvement

2012-04-18 Thread Radim Kolar



Any compaction pass over A will first convert the TTL data into tombstones.

Then, any subsequent pass that includes A *and all other sstables
containing rows with the same key* will drop the tombstones.
thats why i proposed to attach TTL to entire CF. Tombstones would not be 
needed


RE: size tiered compaction - improvement

2012-04-18 Thread Viktor Jevdokimov
Our use case requires Column TTL, not CF TTL, because it is variable, not 
constant.


Best regards/ Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.-Original Message-
From: Radim Kolar [mailto:h...@filez.com]
Sent: Wednesday, April 18, 2012 12:57
To: user@cassandra.apache.org
Subject: Re: size tiered compaction - improvement


 Any compaction pass over A will first convert the TTL data into tombstones.

 Then, any subsequent pass that includes A *and all other sstables
 containing rows with the same key* will drop the tombstones.
thats why i proposed to attach TTL to entire CF. Tombstones would not be needed


Re: size tiered compaction - improvement

2012-04-18 Thread Igor
For my use case it would be nice to have per CF TTL (to protect myself 
from application bug and from storage leak due to missed TTL), but seems 
you can't avoid tombstones even in this case and if you change CF TTL 
during runtime.


On 04/18/2012 03:06 PM, Viktor Jevdokimov wrote:

Our use case requires Column TTL, not CF TTL, because it is variable, not 
constant.


Best regards/ Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.-Original Message-
From: Radim Kolar [mailto:h...@filez.com]
Sent: Wednesday, April 18, 2012 12:57
To: user@cassandra.apache.org
Subject: Re: size tiered compaction - improvement



Any compaction pass over A will first convert the TTL data into tombstones.

Then, any subsequent pass that includes A *and all other sstables
containing rows with the same key* will drop the tombstones.

thats why i proposed to attach TTL to entire CF. Tombstones would not be needed




Re: size tiered compaction - improvement

2012-04-18 Thread Jonathan Ellis
It's not that simple, unless you have an append-only workload.  (See
discussion on
https://issues.apache.org/jira/browse/CASSANDRA-3974.)

On Wed, Apr 18, 2012 at 4:57 AM, Radim Kolar h...@filez.com wrote:

 Any compaction pass over A will first convert the TTL data into
 tombstones.

 Then, any subsequent pass that includes A *and all other sstables
 containing rows with the same key* will drop the tombstones.

 thats why i proposed to attach TTL to entire CF. Tombstones would not be
 needed



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


RE: size tiered compaction - improvement

2012-04-18 Thread Bryce Godfrey
Per CF or per Row TTL would be very usefull for me also with our timeseries 
data.

-Original Message-
From: Igor [mailto:i...@4friends.od.ua] 
Sent: Wednesday, April 18, 2012 6:06 AM
To: user@cassandra.apache.org
Subject: Re: size tiered compaction - improvement

For my use case it would be nice to have per CF TTL (to protect myself from 
application bug and from storage leak due to missed TTL), but seems you can't 
avoid tombstones even in this case and if you change CF TTL during runtime.

On 04/18/2012 03:06 PM, Viktor Jevdokimov wrote:
 Our use case requires Column TTL, not CF TTL, because it is variable, not 
 constant.


 Best regards/ Pagarbiai

 Viktor Jevdokimov
 Senior Developer

 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063
 Fax: +370 5 261 0453

 J. Jasinskio 16C,
 LT-01112 Vilnius,
 Lithuania



 Disclaimer: The information contained in this message and attachments 
 is intended solely for the attention and use of the named addressee 
 and may be confidential. If you are not the intended recipient, you 
 are reminded that the information remains the property of the sender. 
 You must not use, disclose, distribute, copy, print or rely on this 
 e-mail. If you have received this message in error, please contact the 
 sender immediately and irrevocably delete this message and any 
 copies.-Original Message-
 From: Radim Kolar [mailto:h...@filez.com]
 Sent: Wednesday, April 18, 2012 12:57
 To: user@cassandra.apache.org
 Subject: Re: size tiered compaction - improvement


 Any compaction pass over A will first convert the TTL data into tombstones.

 Then, any subsequent pass that includes A *and all other sstables 
 containing rows with the same key* will drop the tombstones.
 thats why i proposed to attach TTL to entire CF. Tombstones would not 
 be needed



Re: size tiered compaction - improvement

2012-04-17 Thread Jonathan Ellis
On Sat, Apr 14, 2012 at 3:27 AM, Radim Kolar h...@filez.com wrote:
 forceUserDefinedCompaction would be more usefull if you could do compaction
 on 2 tables.

You absolutely can.  That's what the user defined part is: you give
it the exact list of sstables you want compacted.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: size tiered compaction - improvement

2012-04-17 Thread Jonathan Ellis
On Sat, Apr 14, 2012 at 4:08 AM, Igor i...@4friends.od.ua wrote:
 Assume I insert all my data with TTL=2weeks and let we have sstable A which
 was created week ago at the time T, so I know that right now it contain:

 1) some data that were inserted not later than T and may-be not expired yet
 2) some amount of data that were already close to expiration due TTL at the
 time T, but still had no chances to be wiped out because up to the current
 moment size-tiered compaction did not involve A into compactions.

 Large amount of data from 2) became expired in a week after time T and
 probably passed gc_grace period, so it shoould be wiped at any compaction on
 table A.

Any compaction pass over A will first convert the TTL data into tombstones.

Then, any subsequent pass that includes A *and all other sstables
containing rows with the same key* will drop the tombstones.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: size tiered compaction - improvement

2012-04-17 Thread Igor
Thank you Jonatathan, I missed this point about converting TTL data to 
tombstones first.


When you say:

   You absolutely can.  That's what the user defined part is: you give
   it the exact list of sstables you want compacted.

does it mean that I can use list (not just one) of sstables as second 
parameter for userDefinedCompaction?


On 04/18/2012 05:53 AM, Jonathan Ellis wrote:

On Sat, Apr 14, 2012 at 4:08 AM, Igori...@4friends.od.ua  wrote:

Assume I insert all my data with TTL=2weeks and let we have sstable A which
was created week ago at the time T, so I know that right now it contain:

1) some data that were inserted not later than T and may-be not expired yet
2) some amount of data that were already close to expiration due TTL at the
time T, but still had no chances to be wiped out because up to the current
moment size-tiered compaction did not involve A into compactions.

Large amount of data from 2) became expired in a week after time T and
probably passed gc_grace period, so it shoould be wiped at any compaction on
table A.

Any compaction pass over A will first convert the TTL data into tombstones.

Then, any subsequent pass that includes A *and all other sstables
containing rows with the same key* will drop the tombstones.





Re: size tiered compaction - improvement

2012-04-17 Thread Jonathan Ellis
On Tue, Apr 17, 2012 at 11:26 PM, Igor i...@4friends.od.ua wrote:
 You absolutely can.  That's what the user defined part is: you give
 it the exact list of sstables you want compacted.

 does it mean that I can use list (not just one) of sstables as second
 parameter for userDefinedCompaction?

If you want them all compacted together into one big sstable, yes.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: size tiered compaction - improvement

2012-04-14 Thread Radim Kolar

Dne 4.4.2012 6:52, Igor napsal(a):
Here is small python script I run once per day. You have to adjust 
size and/or age limits in the 'if' operator. Also I use mx4j interface 
for jmx calls.
forceUserDefinedCompaction would be more usefull if you could do 
compaction on 2 tables. If i run it on single table, it dont shrinks and 
it does not solve my problem - having sstables at size which will be 
never compacted because no other sstable of similar size will be created.


Re: size tiered compaction - improvement

2012-04-14 Thread Igor

I'll try to explain in more details:

Assume I insert all my data with TTL=2weeks and let we have sstable A 
which was created week ago at the time T, so I know that right now it 
contain:


1) some data that were inserted not later than T and may-be not expired yet
2) some amount of data that were already close to expiration due TTL at 
the time T, but still had no chances to be wiped out because up to the 
current moment size-tiered compaction did not involve A into compactions.


Large amount of data from 2) became expired in a week after time T and 
probably passed gc_grace period, so it shoould be wiped at any 
compaction on table A.


Or I missed something?

On 04/14/2012 11:27 AM, Radim Kolar wrote:

Dne 4.4.2012 6:52, Igor napsal(a):
Here is small python script I run once per day. You have to adjust 
size and/or age limits in the 'if' operator. Also I use mx4j 
interface for jmx calls.
forceUserDefinedCompaction would be more usefull if you could do 
compaction on 2 tables. If i run it on single table, it dont shrinks 
and it does not solve my problem - having sstables at size which will 
be never compacted because no other sstable of similar size will be 
created.




size tiered compaction - improvement

2012-04-03 Thread Radim Kolar
there is problem with size tiered compaction design. It compacts 
together tables of similar size.


sometimes it might happen that you will have some sstables sitting on 
disk forever (Feb 23) because no other similar sized tables were created 
and probably never be. because flushed sstable is about 11-16 mb.


next level about 90 MB
then 5x 90 MB gets compacted to 400 MB sstable
and 5x400 MB ~ 2 GB

problem is that 400 MB sstable is too small to be compacted against 
these 3x 720 MB ones.


-rw-r--r--  1 root  wheel   165M Feb 23 17:03 resultcache-hc-13086-Data.db
-rw-r--r--  1 root  wheel   772M Feb 23 17:04 resultcache-hc-13087-Data.db
-rw-r--r--  1 root  wheel   156M Feb 23 17:06 resultcache-hc-13091-Data.db
-rw-r--r--  1 root  wheel   716M Feb 23 17:18 resultcache-hc-13096-Data.db
-rw-r--r--  1 root  wheel   734M Feb 23 17:29 resultcache-hc-13101-Data.db
-rw-r--r--  1 root  wheel   5.0G Mar 14 09:38 resultcache-hc-13923-Data.db
-rw-r--r--  1 root  wheel   1.9G Mar 16 22:41 resultcache-hc-14084-Data.db
-rw-r--r--  1 root  wheel   1.9G Mar 21 15:11 resultcache-hc-14460-Data.db
-rw-r--r--  1 root  wheel   1.9G Mar 27 05:22 resultcache-hc-14694-Data.db
-rw-r--r--  1 root  wheel   2.0G Mar 31 04:57 resultcache-hc-14851-Data.db
-rw-r--r--  1 root  wheel   112M Mar 31 06:30 resultcache-hc-14922-Data.db
-rw-r--r--  1 root  wheel   577M Apr  1 19:25 resultcache-hc-14943-Data.db

compaction strategy needs to compact sstables by timestamp too. older 
tables should have increased chance to get compacted.
for example - table from today will be compacted with other table in 
range (0.5-1.5) of its size, and this range will get increased with 
sstable age. - 1 month old will have range for example (0.2 - 1.8).


Re: size tiered compaction - improvement

2012-04-03 Thread igor
if you know for sure that you will free lot of space compacting some old table, 
then you can call UserdefinedCompaction for this table(you can do this from 
cron). There is also a ticket in jira with discussion on per-sstable expierd 
column and tombstones counters.

 



-Original Message-
From: Radim Kolar h...@filez.com
To: user@cassandra.apache.org
Sent: Tue, 03 Apr 2012 22:53
Subject: size tiered compaction - improvement

there is problem with size tiered compaction design. It compacts 
together tables of similar size.

sometimes it might happen that you will have some sstables sitting on 
disk forever (Feb 23) because no other similar sized tables were created 
and probably never be. because flushed sstable is about 11-16 mb.

next level about 90 MB
then 5x 90 MB gets compacted to 400 MB sstable
and 5x400 MB ~ 2 GB

problem is that 400 MB sstable is too small to be compacted against 
these 3x 720 MB ones.

-rw-r--r--  1 root  wheel   165M Feb 23 17:03 resultcache-hc-13086-Data.db
-rw-r--r--  1 root  wheel   772M Feb 23 17:04 resultcache-hc-13087-Data.db
-rw-r--r--  1 root  wheel   156M Feb 23 17:06 resultcache-hc-13091-Data.db
-rw-r--r--  1 root  wheel   716M Feb 23 17:18 resultcache-hc-13096-Data.db
-rw-r--r--  1 root  wheel   734M Feb 23 17:29 resultcache-hc-13101-Data.db
-rw-r--r--  1 root  wheel   5.0G Mar 14 09:38 resultcache-hc-13923-Data.db
-rw-r--r--  1 root  wheel   1.9G Mar 16 22:41 resultcache-hc-14084-Data.db
-rw-r--r--  1 root  wheel   1.9G Mar 21 15:11 resultcache-hc-14460-Data.db
-rw-r--r--  1 root  wheel   1.9G Mar 27 05:22 resultcache-hc-14694-Data.db
-rw-r--r--  1 root  wheel   2.0G Mar 31 04:57 resultcache-hc-14851-Data.db
-rw-r--r--  1 root  wheel   112M Mar 31 06:30 resultcache-hc-14922-Data.db
-rw-r--r--  1 root  wheel   577M Apr  1 19:25 resultcache-hc-14943-Data.db

compaction strategy needs to compact sstables by timestamp too. older 
tables should have increased chance to get compacted.
for example - table from today will be compacted with other table in 
range (0.5-1.5) of its size, and this range will get increased with 
sstable age. - 1 month old will have range for example (0.2 - 1.8).


Re: size tiered compaction - improvement

2012-04-03 Thread Jonathan Ellis
Twitter tried a timestamp-based compaction strategy in
https://issues.apache.org/jira/browse/CASSANDRA-2735.  The conclusion
was, this actually resulted in a lot more compactions than the
SizeTieredCompactionStrategy. The increase in IO was not acceptable
for our use and therefore stopped working on this patch.

2012/4/3 Radim Kolar h...@filez.com:
 there is problem with size tiered compaction design. It compacts together
 tables of similar size.

 sometimes it might happen that you will have some sstables sitting on disk
 forever (Feb 23) because no other similar sized tables were created and
 probably never be. because flushed sstable is about 11-16 mb.

 next level about 90 MB
 then 5x 90 MB gets compacted to 400 MB sstable
 and 5x400 MB ~ 2 GB

 problem is that 400 MB sstable is too small to be compacted against these 3x
 720 MB ones.

 -rw-r--r--  1 root  wheel   165M Feb 23 17:03 resultcache-hc-13086-Data.db
 -rw-r--r--  1 root  wheel   772M Feb 23 17:04 resultcache-hc-13087-Data.db
 -rw-r--r--  1 root  wheel   156M Feb 23 17:06 resultcache-hc-13091-Data.db
 -rw-r--r--  1 root  wheel   716M Feb 23 17:18 resultcache-hc-13096-Data.db
 -rw-r--r--  1 root  wheel   734M Feb 23 17:29 resultcache-hc-13101-Data.db
 -rw-r--r--  1 root  wheel   5.0G Mar 14 09:38 resultcache-hc-13923-Data.db
 -rw-r--r--  1 root  wheel   1.9G Mar 16 22:41 resultcache-hc-14084-Data.db
 -rw-r--r--  1 root  wheel   1.9G Mar 21 15:11 resultcache-hc-14460-Data.db
 -rw-r--r--  1 root  wheel   1.9G Mar 27 05:22 resultcache-hc-14694-Data.db
 -rw-r--r--  1 root  wheel   2.0G Mar 31 04:57 resultcache-hc-14851-Data.db
 -rw-r--r--  1 root  wheel   112M Mar 31 06:30 resultcache-hc-14922-Data.db
 -rw-r--r--  1 root  wheel   577M Apr  1 19:25 resultcache-hc-14943-Data.db

 compaction strategy needs to compact sstables by timestamp too. older tables
 should have increased chance to get compacted.
 for example - table from today will be compacted with other table in range
 (0.5-1.5) of its size, and this range will get increased with sstable age. -
 1 month old will have range for example (0.2 - 1.8).



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: size tiered compaction - improvement

2012-04-03 Thread Radim Kolar

Dne 3.4.2012 23:04, i...@4friends.od.ua napsal(a):


if you know for sure that you will free lot of space compacting some 
old table, then you can call UserdefinedCompaction for this table(you 
can do this from cron). There is also a ticket in jira with discussion 
on per-sstable expierd column and tombstones counters.


you are talking about CompactionManager,forceUserDefinedCompaction 
mbean? it takes 2 argumenents, no description on them. i never got this 
work. NoSuchElementException returned


Re: size tiered compaction - improvement

2012-04-03 Thread igor
The first is keyspace name, second is sstable name (like 
transaction-hc-1024-Data.db

 



-Original Message-
From: Radim Kolar h...@filez.com
To: user@cassandra.apache.org
Sent: Wed, 04 Apr 2012 3:14
Subject: Re: size tiered compaction - improvement

Dne 3.4.2012 23:04, i...@4friends.od.ua napsal(a):

 if you know for sure that you will free lot of space compacting some 
 old table, then you can call UserdefinedCompaction for this table(you 
 can do this from cron). There is also a ticket in jira with discussion 
 on per-sstable expierd column and tombstones counters.

you are talking about CompactionManager,forceUserDefinedCompaction 
mbean? it takes 2 argumenents, no description on them. i never got this 
work. NoSuchElementException returned


Re: size tiered compaction - improvement

2012-04-03 Thread Igor
Here is small python script I run once per day. You have to adjust size 
and/or age limits in the 'if' operator. Also I use mx4j interface for 
jmx calls.


#!/usr/bin/env python

import sys,os,glob,time,urllib2

CASSANDRA_DATA='/spool1/cassandra/data'
DONTTOUCH=('system',)

now = time.time()

def main():
kss=[ks for ks in os.listdir(CASSANDRA_DATA) if ks not in DONTTOUCH]
for ks in kss:
   sstables=[sst for sst in 
glob.glob(CASSANDRA_DATA+'/'+ks+'/'+'*-Data.db') if sst.find('-tmp-')==-1]

   for table in sstables:
   st = os.stat(table)
   age=(now-st.st_mtime)/24/3600
   size=st.st_size/1024/1024/1024
   if (age = 5 and size = 5) or age = 10:
table_name = table.split('/')[-1]
print compacting , ks, table_name

url='http://localhost:8081/invoke?operation=forceUserDefinedCompactionobjectname=org.apache.cassandra.db%%3Atype%%3DCompactionManagervalue0=%stype0=java.lang.Stringvalue1=%stype1=java.lang.String'%(ks, 
table_name)

r=urllib2.urlopen(url)
time.sleep(1)

if __name__=='__main__':
main()



On 04/04/2012 07:47 AM, i...@4friends.od.ua wrote:


The first is keyspace name, second is sstable name (like 
transaction-hc-1024-Data.db




-Original Message-
From: Radim Kolar h...@filez.com
To: user@cassandra.apache.org
Sent: Wed, 04 Apr 2012 3:14
Subject: Re: size tiered compaction - improvement

Dne 3.4.2012 23 tel:34201223:04, i...@4friends.od.ua 
mailto:i...@4friends.od.ua napsal(a):


 if you know for sure that you will free lot of space compacting some
 old table, then you can call UserdefinedCompaction for this table(you
 can do this from cron). There is also a ticket in jira with discussion
 on per-sstable expierd column and tombstones counters.

you are talking about CompactionManager,forceUserDefinedCompaction
mbean? it takes 2 argumenents, no description on them. i never got this
work. NoSuchElementException returned