potential data loss in Cassandra 1.1.0 .. 1.1.4

Jonathan Ellis Wed, 17 Oct 2012 14:27:54 -0700

I wanted to call out a particularly important bug for those who aren't
in the habit of reading CHANGES.


Summary: the bug was fixed in 1.1.5, with an follow-on fix for 1.1.6
that only affects users of 1.1.0 .. 1.1.4.  Thus, if you upgraded from
1.0.x or earlier directly to 1.1.5, you're okay as far as this is
concerned.  But if you used an earlier 1.1 release, you should upgrade
to 1.1.6.

Explanation:

A rewrite of the commitlog code for 1.1.0 used Java's nanotime api to
generate commitlog segment IDs.  This could cause data loss in the
event of a power failure, since we assume commitlog IDs are strictly
increasing in our replay logic.  Simplified, the replay logic looks like this:

1. Take the most recent flush time X for each columnfamily
2. Replay all activity in the commitlog that occurred after X

The problem is that nanotime gets effectively a new random seed after
a reboot.  If the new seed is substantially below the old one, any new
commitlog segments will never be "after" the pre-reboot flush
timestamps.  Subsequently, restarting Cassandra will not replay any
unflushed updates.

We fixed the nanotime problem in 1.1.5 (CASSANDRA-4601).  But, we
didn't realize the implications for replay timestamps until later
(CASSANDRA-4782).  To fix these retroactively, 1.1.6 sets the flush
time of pre-1.1.6 sstables to zero.  Thus, the first startup of 1.1.6
will result in replaying the entire commitlog, including data that may
have already been flushed.

Replaying already-flushed data a second time is harmless -- except for
counters.  So, to avoid replaying flushed counter data, we recommend
performing drain when shutting down the pre-1.1.6 C* prior to upgrade.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

potential data loss in Cassandra 1.1.0 .. 1.1.4

Reply via email to