> Following your suggestions, of using key of super column as range token > won't I have a storage problem?
You won't get me to proclaim that you won't have a storage problem ;) If you're going to deploy this at scale, I'm sure you'll have problems whatever you do... > I couldn't find information about this so I'll just ask: If I have a > (Super/)ColumnFamily that contains 1 "key" for the row but that row contains > millions of k:v entries. Would that be a efficient Cassandra design? > Does cassandra store a CF row on a single now or can it / should it > distribute this data? > Does having millions of k:v entries in a single row of a CF would be > considered a good practice? (in terms of query time, range scans and co ?) The replication set/distribution is on a per-row basis, so you generally don't want individual rows to be a significant part of the entire data set. You definitely don't want super columns that are huge; individual super column's columns aren't indexed on disk, for one thing. Having large rows with lots of columns... maybe. In general it's certainly supported, but the overall impact if you're intended to have relatively few rows all being very large - I don't want to say too much here. Anyone else? (anti-entropy granularity, compaction in-memory thresholds and GC tweaking, etc) -- / Peter Schuller
