Hello everyone,

We are running a 3 node Cassandra 1.1.5 cluster with a 3TB Raid 0 disk per
node for the data dir and separate disk for the commitlog, 12 cores, 24 GB
RAM (12GB to Cassandra heap).

We now have 1.1 TB worth of data per node (RF = 2).

Our data input is between 20 to 30 GB per day, depending on operating
conditions of the data sources.

Problem is we have to start deleting data because we will hit the capacity.

>From reading around we see we have 2 options:

1. Start issuing regular major compactions (nodetool compact).
     - This is not recommended:
            - Stops minor compactions.
            - Major performance hit on node (very bad for us because need
to be taking data all the time).

2. Switch to Leveled compaction strategy.
      - It is mentioned to help with deletes and disk space usage. Can
someone confirm?

Can anyone suggest which (if any) is better? Are there better solutions?

Disclaimer:
- Our usage pattern is write once, read once (export) and delete once!
Basically we are using Cassandra as a data buffer between our collection
points and a long term back-up system (it should provide a time window e.g.
1 month of data before data gets deleted from the cluster).

- Due to financial and space constraints it is very unlikely we can add
more nodes to the cluster.

- We were thinking of relying on the automatic minor compactions to free up
space for us but as the Size-Tiered compaction strategy seems to work, we
will hit the capacity before we manage to free up disk space (this is very
strange because no matter how much disk space you have per node data files
will get larger and larger and you will eventually hit the same problem of
minor compactions not freeing space fast enough - Can someone confirm?)

Cheers,
Alex

Reply via email to