You should update from 2.0 to avoid this behaviour, is the simple answer.
You are correct that when the commit log gets full the memtables are
flushed to make room.  2.0 has several interrelated problems here though:

There is a maximum flush queue length property (I cannot recall its name),
and once there are this many memtables flushing, no more writes can take
place on the box, whatsoever.  You cannot simply increase this length,
though, because that shrinks the maximum size of any single memtable (it
is, iirc, total_memtable_space / (1 + flush_writers + max_queue_length)),
which worsens write-amplification from compaction.

This is because the memory management for memtables in 2.0 was really
terrible, and this queue length was used to try to ensure the space
allocated was not exceeded.

Compounding this, when clearing the commit log 2.0 will flush all memtables
with data in them regardless of it is useful to do so, meaning having more
tables (that are actively written to) than your max queue length will
necessarily cause stalls every time you run out of commit log space.

In 2.1, none of these concerns apply.


On 24 August 2016 at 23:40, Vasileios Vlachos <vasileiosvlac...@gmail.com>
wrote:

> Hello,
>
>
>
>
>
> We have an 8-node cluster spread out in 2 DCs, 4 nodes in each one. We run
> C* 2.0.17 on Ubuntu 12.04 at the moment.
>
>
>
>
> Our C# application often throws logs, which correlate with dropped
> messages (counter mutations usually) in our logs. We think that if a
> specific mutaiton stays in the queue for more than 5 seconds, Cassandra
> drops it. This is also suggested by these lines in system.log:
>
> ERROR [ScheduledTasks:1] 2016-08-23 13:29:51,454 MessagingService.java
> (line 912) MUTATION messages were dropped in last 5000 ms: 317 for internal
> timeout and 0 for cross node timeout
> ERROR [ScheduledTasks:1] 2016-08-23 13:29:51,454 MessagingService.java
> (line 912) COUNTER_MUTATION messages were dropped in last 5000 ms: 6 for
> internal timeout and 0 for cross node timeout
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,455 StatusLogger.java (line
> 55) Pool Name                    Active   Pending      Completed   Blocked
>  All Time Blocked
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,455 StatusLogger.java (line
> 70) ReadStage                         0         0      245177190         0
>                 0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,455 StatusLogger.java (line
> 70) RequestResponseStage              0         0     3530334509         0
>                 0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,456 StatusLogger.java (line
> 70) ReadRepairStage                   0         0        1549567         0
>                 0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,456 StatusLogger.java (line
> 70) MutationStage                    48      1380     2540965500         0
>                 0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,456 StatusLogger.java (line
> 70) ReplicateOnWriteStage             0         0      189615571         0
>                 0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line
> 70) GossipStage                       0         0       20586077         0
>                 0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line
> 70) CacheCleanupExecutor              0         0              0         0
>                 0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line
> 70) MigrationStage                    0         0            106         0
>                 0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line
> 70) MemoryMeter                       0         0         303029         0
>                 0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,458 StatusLogger.java (line
> 70) ValidationExecutor                0         0              0         0
>                 0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,458 StatusLogger.java (line
> 70) FlushWriter                       1         5         322604         1
>              8227
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,458 StatusLogger.java (line
> 70) InternalResponseStage             0         0             35         0
>                 0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,459 StatusLogger.java (line
> 70) AntiEntropyStage                  0         0              0         0
>                 0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,459 StatusLogger.java (line
> 70) MemtablePostFlusher               1         5         424104         0
>                 0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,459 StatusLogger.java (line
> 70) MiscStage                         0         0              0         0
>                 0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,460 StatusLogger.java (line
> 70) PendingRangeCalculator            0         0             37         0
>                 0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,460 StatusLogger.java (line
> 70) commitlog_archiver                0         0              0         0
>                 0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
> 70) CompactionExecutor                4         4        5144499         0
>                 0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
> 70) HintedHandoff                     0         0           3194         0
>                 0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
> 79) CompactionManager                 1         4
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
> 81) Commitlog                       n/a         0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
> 93) MessagingService                n/a       0/0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,462 StatusLogger.java (line
> 103) Cache Type                     Size                 Capacity
>     KeysToSave
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,462 StatusLogger.java (line
> 105) KeyCache                  104828280                104857600
>            all
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,462 StatusLogger.java (line
> 111) RowCache                          0                        0
>            all
>
>
>
> So far we have noticed that when our application throws logs, Cassandra
> will have FlushWriter Active=1 and Pending=4+ tasks (like shown above).
> Just before the tpstats-like output that appears in system.log we seem to
> have consecutive flushes which take anywhere between 7-20 seconds, like
> shown below:
>
> INFO [FlushWriter:26567] 2016-08-22 21:15:30,550 Memtable.java (line 402)
> Completed flushing /var/lib/cassandra/data/LIVEStats/statsperdealer/
> LIVEStats-statsperdealer-jb-50127-Data.db (640031 bytes) for commitlog
> position ReplayPosition(segmentId=1465909831301, position=225)
> INFO [FlushWriter:26567] 2016-08-22 21:15:30,678 Memtable.java (line 402)
> Completed flushing /var/lib/cassandra/data/LIVEStats/statsperdealerpervrm/
> LIVEStats-statsperdealerpervrm-jb-33131-Data.db (893342 bytes) for
> commitlog position ReplayPosition(segmentId=1465909831301, position=225)
> INFO [FlushWriter:26567] 2016-08-22 21:15:30,706 Memtable.java (line 402)
> Completed flushing /var/lib/cassandra/data/LIVEStats/statsperserver/
> LIVEStats-statsperserver-jb-48158-Data.db (4862 bytes) for commitlog
> position ReplayPosition(segmentId=1465909831301, position=225)
>
> INFO [FlushWriter:26567] 2016-08-22 21:15:33,881 Memtable.java (line 402)
> Completed flushing /var/lib/cassandra/data/Organami/sessionstate/
> Organami-sessionstate-jb-55718-Data.db (76781692 bytes) for commitlog
> position ReplayPosition(segmentId=1465909831301, position=225)
>
> INFO [FlushWriter:26567] 2016-08-22 21:15:35,314 Memtable.java (line 402)
> Completed flushing 
> /var/lib/cassandra/data/LIVEStock/stock/LIVEStock-stock-jb-36747-Data.db
> (9302046 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
> position=8363)
> INFO [FlushWriter:26567] 2016-08-22 21:15:35,341 Memtable.java (line 402)
> Completed flushing /var/lib/cassandra/data/system/compactions_in_
> progress/system-compactions_in_progress-jb-1018208-Data.db (161 bytes)
> for commitlog position ReplayPosition(segmentId=1465909831301,
> position=226441)
> INFO [FlushWriter:26567] 2016-08-22 21:15:38,719 Memtable.java (line 402)
> Completed flushing /var/lib/cassandra/data/DataMining/
> quotebyquotereference/DataMining-quotebyquotereference-jb-579845-Data.db
> (18754778 bytes) for commitlog position 
> ReplayPosition(segmentId=1465909831301,
> position=325555)
> INFO [FlushWriter:26567] 2016-08-22 21:15:38,770 Memtable.java (line 402)
> Completed flushing /var/lib/cassandra/data/system/compactions_in_
> progress/system-compactions_in_progress-jb-1018209-Data.db (167 bytes)
> for commitlog position ReplayPosition(segmentId=1465909831301,
> position=429568)
> INFO [FlushWriter:26567] 2016-08-22 21:15:38,793 Memtable.java (line 402)
> Completed flushing /var/lib/cassandra/data/system/compactions_in_
> progress/system-compactions_in_progress-jb-1018210-Data.db (161 bytes)
> for commitlog position ReplayPosition(segmentId=1465909831301,
> position=430537)
> INFO [FlushWriter:26567] 2016-08-22 21:15:38,813 Memtable.java (line 402)
> Completed flushing /var/lib/cassandra/data/system/compactions_in_
> progress/system-compactions_in_progress-jb-1018211-Data.db (158 bytes)
> for commitlog position ReplayPosition(segmentId=1465909831301,
> position=433145)
>
> Even though no individual memtable seems to be taking a very long time,
> the commitlog segment id for the flushes shown above is the same. We think
> that this happens because the commitlog reached its limit and is rotating
> this particular segment, which triggers flush activity for all memtables
> referenced in this segment.
>
> Our questions:
>
>
>
>
>
> 1. Will writes be blocked for the tables listed in the above FlushWriter
> output while this commitlog segment is being rotated? If so, will the
> blocking last for the entire duration or just for the time taken for the
> respective memtable to be flushed to disk?
>
> 2. We have also noticed that flushes can also be triggered by MemoryMeter
> (probably when occupancy fraction reaches the specified limit, at which
> point the largest memtable is flushed to disk). Would it be wise to
> increase the commitlog total size, so that MemoryMeter is the main reason
> for flush activity? When flushes are triggered due to "memory pressure" we
> never seem to through logs in our application.
>
> 3. Is there a recommended value for the ratio (total size of Commitlog
> segments)/(total size of Memtables)? We use the defaults for both values
> which makes the total size for Commitlog segments 1G (32 files of 32M) and
> the total size used for Memtables 2G, since we have allocated 8G in total
> for Heap.
>
> 4. The total space for all memtables seems to be Heap/4; this is 2G for
> us. Is there a maximum size for an individual memtable?
>
> 5. What is the max queue size for MemtablePostFlusher and exactly happens
> after a flush (which I pressume relates to MemtablePostFlusher)? Is it
> concerning that we also have five pending tasks there?
>
> Many thanks,
>
>
> Vasilis
>

Reply via email to