> Keep in mind that there is additional data storage overhead, including > timestamps and column names. Because the schema can vary from row to row, > the column names are stored with each row, in addition to the data. Disk > space-efficiency is not a primary design goal for Cassandra.
If the row's that are 200k (or was it 100k) are not single columns but rather lots and lots of smaller columns, then this will be significant. In addition, during compaction there is the potential for using twice the amount of disk in a column family (during a major compaction all data will at some point exist in duplicates). -- / Peter Schuller