On Mon, Jul 23, 2012 at 1:25 PM, Mike Heffner <m...@librato.com> wrote:
> Hi, > > We are migrating from a 0.8.8 ring to a 1.1.2 ring and we are noticing > missing data post-migration. We use pre-built/configured AMIs so our > preferred route is to leave our existing production 0.8.8 untouched and > bring up a parallel 1.1.2 ring and migrate data into it. Data is written to > the rings via batch processes so we can easily assure that both the > existing and new rings will have the same data post migration. > > <snip> > The steps we are taking are: > > 1. Bring up a 1.1.2 ring in the same AZ/data center configuration with > tokens matching the corresponding nodes in the 0.8.8 ring. > 2. Create the same keyspace on 1.1.2. > 3. Create each CF in the keyspace on 1.1.2. > 4. Flush each node of the 0.8.8 ring. > 5. Rsync each non-compacted sstable from 0.8.8 to the corresponding node > in 1.1.2. > 6. Move each 0.8.8 sstable into the 1.1.2 directory structure by renaming > the file to the /cassandra/data/<keyspace>/<cf>/<keyspace>-<cf>... format. > For example, for the keyspace "Metrics" and CF "epochs_60" we get: > "cassandra/data/Metrics/epochs_60/Metrics-epochs_60-g-941-Data.db". > 7. On each 1.1.2 node run `nodetool -h localhost refresh Metrics <CF>` for > each CF in the keyspace. We notice that storage load jumps accordingly. > 8. On each 1.1.2 node run `nodetool -h localhost upgradesstables`. This > takes awhile but appears to correctly rewrite each sstable in the new 1.1.x > format. Storage load drops as sstables are compressed. > > So, after some further testing we've observed that the `upgradesstables` command is removing data from the sstables, leading to our missing data. We've repeated the steps above with several variations: WORKS refresh -> scrub WORKS refresh -> scrub -> major compaction FAILS refresh -> upgradesstables FAILS refresh -> scrub -> upgradesstables FAILS refresh -> scrub -> major compaction -> upgradesstables So, we are able to migrate our test CFs from a 0.8.8 ring to a 1.1.2 ring when we use scrub. However, whenever we run an upgradesstables command the sstables are shrunk significantly and our tests show missing data: INFO [CompactionExecutor:4] 2012-07-24 04:27:36,837 CompactionTask.java (line 109) Compacting [SSTableReader(path='/raid0/cassandra/data/Metrics/metrics_900/Metrics-metrics_900-hd-51-Data.db')] INFO [CompactionExecutor:4] 2012-07-24 04:27:51,090 CompactionTask.java (line 221) Compacted to [/raid0/cassandra/data/Metrics/metrics_900/Metrics-metrics_900-hd-58-Data.db,]. 60,449,155 to 2,578,102 (~4% of original) bytes for 4,002 keys at 0.172562MB/s. Time: 14,248ms. Is there a scenario where upgradesstables would remove data that a scrub command wouldn't? According the documentation, it would appear that the scrub command is actually more destructive than upgradesstables in terms of removing data. On 1.1.x, upgradesstables is the documented upgrade command over a scrub. The keyspace is defined as: Keyspace: Metrics: Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy Durable Writes: true Options: [us-east:3] And the column family above defined as: ColumnFamily: metrics_900 Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type Default column value validator: org.apache.cassandra.db.marshal.BytesType Columns sorted by: org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.LongType,org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type) GC grace seconds: 0 Compaction min/max thresholds: 4/32 Read repair chance: 0.1 DC Local Read repair chance: 0.0 Replicate on write: true Caching: KEYS_ONLY Bloom Filter FP chance: default Built indexes: [] Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy Compression Options: sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor All rows have a TTL of 30 days, so it's possible that, along with the gc_grace=0, a small number would be removed during a compaction/scrub/upgradesstables step. However, the majority should still be kept as their TTL has not expired yet. We are still experimenting to see under what conditions this happens, but I thought I'd send out some more info in case there is something clearly wrong we're doing here. Thanks, Mike -- Mike Heffner <m...@librato.com> Librato, Inc.