MPF is indeed pretty lightweight, but since its job is to mark the commitlog replay position after a flush -- which has to be done in flush order to preserve correctness in failure scenarios -- you'll see the pending op count go up when you have multiple flushes happening. This is expected.
Your real problem is the 17000 pending mutations, the 22 active + pending flushes, and probably compaction activity as well. (Also, if each of those pending mutations is 10,000 columns, you may be causing yourself memory pressure as well.) On Wed, Apr 27, 2011 at 11:01 AM, Terje Marthinussen <tmarthinus...@gmail.com> wrote: > 0.8 trunk: > > When playing back a fairly large chunk of hints, things basically locks up > under load. > The hints are never processed successfully. Lots of Mutations dropped. > > One thing is that maybe the default 10k columns per send with 50ms delays is > a bit on the aggressive side (10k*20 =200.000 columns in a second?), the > other thing is that it seems like the whole memtable flushing locks up. > > I tried to increase number of memtable flushers and queue a bit (8 > concurrent flushers) to make things work, but no luck. > > Pool Name Active Pending Completed > ReadStage 0 0 1 > RequestResponseStage 0 0 2236304 > MutationStage 100 17564 4011533 > ReadRepairStage 0 0 0 > ReplicateOnWriteStage 0 0 0 > GossipStage 0 0 2281 > AntiEntropyStage 0 0 0 > MigrationStage 0 0 0 > MemtablePostFlusher 1 13 50 > StreamStage 0 0 0 > FlushWriter 8 14 73 > MiscStage 0 0 0 > FlushSorter 0 0 0 > InternalResponseStage 0 0 0 > HintedHandoff 1 8 3 > > A quick source code scan makes me believe that the MemtablePostFlusher > should not normally use a lot of time, but it seem like it does so here. > What may cause this? > > Terje > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com