Re: data clean up problem

cem Tue, 28 May 2013 12:47:11 -0700

Thanks for the answer.

Sorry for the misunderstanding. I tried to say I don't send delete request
from the client so it safe to set gc_grace to 0. TTL is used for data clean
up. I am not running a manual compaction. I tried that ones but it took a
lot of time finish and I will not have this amount of off-peek time in the
production to run this. I even set the compaction throughput to unlimited
and it didnt help that much.


Disk size just keeps on growing but I know that there is enough space to
store 1 day data.

What do you think about time rage partitioning? Creating new column family
for each partition and drop when you know that all records are expired.

I have 5 nodes.

Cem.




On Tue, May 28, 2013 at 9:37 PM, Hiller, Dean <dean.hil...@nrel.gov> wrote:

> Also, how many nodes are you running?
>
> From: cem <cayiro...@gmail.com<mailto:cayiro...@gmail.com>>
> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <
> user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> Date: Tuesday, May 28, 2013 1:17 PM
> To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <
> user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> Subject: Re: data clean up problem
>
> Thanks for the answer but it is already set to 0 since I don't do any
> delete.
>
> Cem
>
>
> On Tue, May 28, 2013 at 9:03 PM, Edward Capriolo <edlinuxg...@gmail.com
> <mailto:edlinuxg...@gmail.com>> wrote:
> You need to change the gc_grace time of the column family. It defaults to
> 10 days. By default the tombstones will not go away for 10 days.
>
>
> On Tue, May 28, 2013 at 2:46 PM, cem <cayiro...@gmail.com<mailto:
> cayiro...@gmail.com>> wrote:
> Hi Experts,
>
>
> We have general problem about cleaning up data from the disk. I need to
> free the disk space after retention period and the customer wants to
> dimension the disk space base on that.
>
> After running multiple performance tests with TTL of 1 day we saw that the
> compaction couldn't keep up with the request rate. Disks were getting full
> after 3 days. There were also a lot of sstables that are older than 1 day
> after 3 days.
>
> Things that we tried:
>
> -Change the compaction strategy to leveled. (helped a bit but not much)
>
> -Use big sstable size (10G) with leveled compaction to have more
> aggressive compaction.(helped a bit but not much)
>
> -Upgrade Cassandra from 1.0 to 1.2 to use TTL histograms (didn't help at
> all since it has key overlapping estimation algorithm that generates %100
> match. Although we don't have...)
>
> Our column family structure is like this:
>
> Event_data_cf: (we store event data. Event_id  is randomly generated and
> each event has attributes like location=london)
>
> row                  data
>
> event id          data blob
>
> timeseries_cf: (key is the attribute that we want to index. It can be
> location=london, we didnt use secondary indexes because the indexes are
> dynamic.)
>
> row                  data
>
> index key       time series of event id (event1_id, event2_id….)
>
> timeseries_inv_cf: (this is used for removing event by event row key. )
>
> row                  data
>
> event id          set of index keys
>
> Candidate Solution: Implementing time range partitions.
>
> Each partition will have column family set and will be managed by client.
>
> Suppose that you want to have 7 days retention period. Then you can
> configure the partition size as 1 day and have 7 active partitions at any
> time. Then you can drop inactive partitions (older that 7 days). Dropping
> will immediate remove the data from the disk. (With proper Cassandra.yaml
> configuration)
>
> Storing an event:
>
> Find the current partition p1
>
> store to event_data to Event_data_cf_p1
>
> store to indexes to timeseries_cff_p1
>
> store to inverted indexes to timeseries_inv_cf_p1
>
>
> A time range query with an index:
>
> Find the all partitions belongs to that time range
>
> Do read starting from the first partition until you reach to limit
>
> .....
>
> Could you please provide your comments and concerns ?
>
> Is there any other option that we can try?
>
> What do you think about the candidate solution?
>
> Does anyone have the same issue? How would you solve it in another way?
>
>
> Thanks in advance!
>
> Cem
>
>
>

Re: data clean up problem

Reply via email to