Thanks for the link. #2 was not meant to be trick question, it just came out like that :). what i was after is the overhead associated with large number of keyspaces and column families (i didn't mean empty memtables :). If a few keyspaces that have 20 or so column families with a percentage of rows cached. Does this effect write performance to other keyspaces in the cluster?
On Wed, Oct 20, 2010 at 12:01 PM, Edward Capriolo <edlinuxg...@gmail.com>wrote: > On Wed, Oct 20, 2010 at 2:47 PM, CassUser CassUser <cassu...@gmail.com> > wrote: > > Hey, > > > > As I understand it writes go directly to the commit log. Once a > threshold > > has been reached the data is shipped to a memtable, and again to an > sstable. > > > > 1. How many memtables are created when a flush happens from a commit log? > > One per CF? > > > > 2. Is there any space associated with an empty memtable? > > > > 3. When a flush happens from a memtable to an sstable, does this create a > > single new sstable? > > > > 4. Should compaction be turned off during a large data load? > > > > Thanks. > > > > Take a look at: > > > http://wiki.apache.org/cassandra/MemtableSSTable > > 1 and 3 > Memtables flush for three reasons size, time, and number of > operations. There is one memtable per column family. Each memtable > flushes individually. > > 2. Is this a trick question? > > 4. Should compaction be turned off during a large data load? > You can disable compaction during bulk loads. This can help because > otherwise the same data might be compacted multiple times. However if > you go to long with compaction turned off you end up with multiple > sstables. This can end up in fragmented rows. >