If you had Cassandra 2.0.x (possibly before) and upgraded to Cassandra 2.1, you 
may have had

commitlog_sync: batch
commitlog_sync_batch_window_in_ms: 25

in you cassiandra.yaml

It turned out that this was pretty much broken in 2.0 (i.e. fsyncs just 
happened immediately), but fixed in 2.1, which meant that every mutation 
blocked its writer thread for 25ms meaning at 80 mutations/sec/writer thread 
you’d start DROPPING mutations if your write timeout is 2000ms.

This turns out to be a massive problem if you write fast, and the default 
commitlog_sync_batch_window_in_ms was changed to 2 ms in 2.1.6 as a way of 
addressing this (with some suggesting 1ms)

Neither of these changes got much fanfare except an eventual reference in 
CHANGES.TXT

With 2.1.9 if you aren’t doing periodic sync, then I think the new behavior is 
just to sync whenever the commit logs have a consistent/complete set of 
mutations ready.

Note this is hard to diagnose because CPU is idle and pretty much all latency 
metrics (except the overall coordinator write) do not count this time (and you 
probably weren’t noticing the 25ms write ACK time). It turned out for us that 
one of our nodes was getting more writes (> 20k mutations per second) which was 
about the magic number… anything shy of that and everything looked fine, but 
just by going slightly over, this node was dropping lots of mutations.




Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to