Hi! In datastax documentation<http://www.datastax.com/docs/1.0/ddl/column_family>there is an explanation of what CFs are a good fit for compression:
When to Use Compression Compression is best suited for column families where there are many rows, with each row having the same columns, or at least many columns in common. For example, a column family containing user data such as username, email, etc., would be a good candidate for compression. The more similar the data across rows, the greater the compression ratio will be, and the larger the gain in read performance. Compression is not as good a fit for column families where each row has a different set of columns, or where there are just a few very wide rows. Dynamic column families such as this will not yield good compression ratios. I have many column families where rows share some of the columns and have varied number of unique columns per row. For example, I have a CF where each row has ~13 shared columns, but between 0 to many unique columns. Will such CF be a good fit for compression? More generally, is there a rule of thumb for how many shared columns (or percentage of columns which are shared) is considered a good fit for compression? Thanks, *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956
<<tokLogo.png>>