Hi Jonathan. We are currently running the datastax AMI on amazon. Cassandra is in version 1.1.2.
I guess that the datastax repo (deb http://debian.datastax.com/communitystable main) will be updated directly in 1.1.6 ? "Replaying already-flushed data a second time is harmless -- except for counters. So, to avoid replaying flushed counter data, we recommend performing drain when shutting down the pre-1.1.6 C* prior to upgrade." I'm afraid to forget draining my node before my next update or update + expand. Could you ask your team to add this specific warning in your documentation like here : http://www.datastax.com/docs/1.1/install/expand_ami (we use to update to last stable release before expand) or here : http://www.datastax.com/docs/1.1/install/upgrading or in any other place where this could be useful ? Having counters replayed would lead to a big mess in our app, I guess there are more people in our case who could save a lot of time and money with an up to date documentation. Anyway, thank you for this bug fix and this warning. Alain 2012/10/17 Jonathan Ellis <[email protected]> > I wanted to call out a particularly important bug for those who aren't > in the habit of reading CHANGES. > > Summary: the bug was fixed in 1.1.5, with an follow-on fix for 1.1.6 > that only affects users of 1.1.0 .. 1.1.4. Thus, if you upgraded from > 1.0.x or earlier directly to 1.1.5, you're okay as far as this is > concerned. But if you used an earlier 1.1 release, you should upgrade > to 1.1.6. > > Explanation: > > A rewrite of the commitlog code for 1.1.0 used Java's nanotime api to > generate commitlog segment IDs. This could cause data loss in the > event of a power failure, since we assume commitlog IDs are strictly > increasing in our replay logic. Simplified, the replay logic looks like > this: > > 1. Take the most recent flush time X for each columnfamily > 2. Replay all activity in the commitlog that occurred after X > > The problem is that nanotime gets effectively a new random seed after > a reboot. If the new seed is substantially below the old one, any new > commitlog segments will never be "after" the pre-reboot flush > timestamps. Subsequently, restarting Cassandra will not replay any > unflushed updates. > > We fixed the nanotime problem in 1.1.5 (CASSANDRA-4601). But, we > didn't realize the implications for replay timestamps until later > (CASSANDRA-4782). To fix these retroactively, 1.1.6 sets the flush > time of pre-1.1.6 sstables to zero. Thus, the first startup of 1.1.6 > will result in replaying the entire commitlog, including data that may > have already been flushed. > > Replaying already-flushed data a second time is harmless -- except for > counters. So, to avoid replaying flushed counter data, we recommend > performing drain when shutting down the pre-1.1.6 C* prior to upgrade. > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com >
