Thanks for the answer. Sorry for the misunderstanding. I tried to say I don't send delete request from the client so it safe to set gc_grace to 0. TTL is used for data clean up. I am not running a manual compaction. I tried that ones but it took a lot of time finish and I will not have this amount of off-peek time in the production to run this. I even set the compaction throughput to unlimited and it didnt help that much.
Disk size just keeps on growing but I know that there is enough space to store 1 day data. What do you think about time rage partitioning? Creating new column family for each partition and drop when you know that all records are expired. I have 5 nodes. Cem. On Tue, May 28, 2013 at 9:37 PM, Hiller, Dean <dean.hil...@nrel.gov> wrote: > Also, how many nodes are you running? > > From: cem <cayiro...@gmail.com<mailto:cayiro...@gmail.com>> > Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" < > user@cassandra.apache.org<mailto:user@cassandra.apache.org>> > Date: Tuesday, May 28, 2013 1:17 PM > To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" < > user@cassandra.apache.org<mailto:user@cassandra.apache.org>> > Subject: Re: data clean up problem > > Thanks for the answer but it is already set to 0 since I don't do any > delete. > > Cem > > > On Tue, May 28, 2013 at 9:03 PM, Edward Capriolo <edlinuxg...@gmail.com > <mailto:edlinuxg...@gmail.com>> wrote: > You need to change the gc_grace time of the column family. It defaults to > 10 days. By default the tombstones will not go away for 10 days. > > > On Tue, May 28, 2013 at 2:46 PM, cem <cayiro...@gmail.com<mailto: > cayiro...@gmail.com>> wrote: > Hi Experts, > > > We have general problem about cleaning up data from the disk. I need to > free the disk space after retention period and the customer wants to > dimension the disk space base on that. > > After running multiple performance tests with TTL of 1 day we saw that the > compaction couldn't keep up with the request rate. Disks were getting full > after 3 days. There were also a lot of sstables that are older than 1 day > after 3 days. > > Things that we tried: > > -Change the compaction strategy to leveled. (helped a bit but not much) > > -Use big sstable size (10G) with leveled compaction to have more > aggressive compaction.(helped a bit but not much) > > -Upgrade Cassandra from 1.0 to 1.2 to use TTL histograms (didn't help at > all since it has key overlapping estimation algorithm that generates %100 > match. Although we don't have...) > > Our column family structure is like this: > > Event_data_cf: (we store event data. Event_id is randomly generated and > each event has attributes like location=london) > > row data > > event id data blob > > timeseries_cf: (key is the attribute that we want to index. It can be > location=london, we didnt use secondary indexes because the indexes are > dynamic.) > > row data > > index key time series of event id (event1_id, event2_id….) > > timeseries_inv_cf: (this is used for removing event by event row key. ) > > row data > > event id set of index keys > > Candidate Solution: Implementing time range partitions. > > Each partition will have column family set and will be managed by client. > > Suppose that you want to have 7 days retention period. Then you can > configure the partition size as 1 day and have 7 active partitions at any > time. Then you can drop inactive partitions (older that 7 days). Dropping > will immediate remove the data from the disk. (With proper Cassandra.yaml > configuration) > > Storing an event: > > Find the current partition p1 > > store to event_data to Event_data_cf_p1 > > store to indexes to timeseries_cff_p1 > > store to inverted indexes to timeseries_inv_cf_p1 > > > A time range query with an index: > > Find the all partitions belongs to that time range > > Do read starting from the first partition until you reach to limit > > ..... > > Could you please provide your comments and concerns ? > > Is there any other option that we can try? > > What do you think about the candidate solution? > > Does anyone have the same issue? How would you solve it in another way? > > > Thanks in advance! > > Cem > > >