Re: Cassandra disk space utilization

Peter Schuller Wed, 07 Jul 2010 09:22:22 -0700

> Keep in mind that there is additional data storage overhead, including
> timestamps and column names. Because the schema can vary from row to row,
> the column names are stored with each row, in addition to the data. Disk
> space-efficiency is not a primary design goal for Cassandra.


If the row's that are 200k (or was it 100k) are not single columns but
rather lots and lots of smaller columns, then this will be
significant.

In addition, during compaction there is the potential for using twice
the amount of disk in a column family (during a major compaction all
data will at some point exist in duplicates).

-- 
/ Peter Schuller

Re: Cassandra disk space utilization

Reply via email to