> Your post refers to "obsolete" sstables, but the only thing that makes them > "obsolete" in this case is that they have been compacted?
Yes. > As I understand Julie's case, she is : > > a) initializing her cluster > b) inserting some number of unique keys with CL.ALL > c) noticing that more disk space (6x?) than is expected is used > d) but that she gets expected usage if she does a major compaction > > In other words, the problem isn't "temporary disk space occupied during the > compact", it's permanent disk space occupied unless she compacts. Sorry, I re-read my previous message and I wasn't being very clear and I was probably a bit confused too ;) The reason I mention temporary spikes during compaction is that these are fully expected to roughly double disk space use. I did not mean to imply that this is what she is seeing, since she's specifically waiting for background compaction to complete. Rather, my point was that as long as we are only talking about roughly a doubling of disk space, regardless of its cause, it is not worse than what you may expect anyway, even if temporarily, during active compaction. I still have no explanation for lingering data, other than obsolete sstables waiting for the GC to trigger actual file removal. *However*, and this is what I meant with my follow-up, that still does not explain the data from her post unless 'nodetool ring' reports total sstable size rather than the total size of live sstables. But let's suppose that's wrong and 'nodetool ring' somehow does include all sstable data; in such a case it seems that the data from Julie's latest post may be consistent with obsolete sstables, since the repeated attempts at compaction/cleanup may very well have triggered a CMS sweep, thus eventually freeing the sstables (while simply leaving the cluster idle would not have done so normally). It is also worth noting that this kind of space waste (sstables hanging around waiting for a GC) is not likely to scale to larger data sets as the probability of triggering a CMS sweep within a reasonable period after compaction, increases with the amount of traffic you throw at the cluster. So if you're e.g. writing 100 gb of data, you're pretty unlikely to not trigger a CMS sweep within the first few percent (assuming you don't have a huge 250 gb heap or something). For this reason (even disregarding that cassandra tries to trigger a GC when disk runs low) I would not count them as expected to be "cumulative" on top of any temporary spikes resulting from compaction; so in effect the idea is that these effects should normally not matter since you need the disk space to deal with the compaction spikes anyway, and you are unlikely to have both compaction spikes *and* these obsolete sstables at the same time even without the GC triggering cassandra does. > There is clearly overhead from there being multiple SSTables with multiple > bloom filters and multiple indexes. But from my understanding, that does not > fully account for the difference in disk usage she is seeing. If it is 6x > across the whole cluster, it seems unlikely that the meta information is 5x > the size of the actual information. Definitely agreed. I should have made it clearer that I was only addressing the post. Julie, are you still seeing 6x and similar factors of wastes in your re-producable test cases, or was the 6x factor limited to your initial experience with real data? The reason I'm barking up this tree is that I figure we may be observing the results of two independent problems here, and if the smaller test case can be explained away by lack of GC, then it probably doesn't help figuring out what happened in the original problem scenario. Hmm. Maybe a 'nodetool gc' would be in order to make it easy to trigger a gc without going for jconsole/JMX manually. > I haven't been following this thread very closely, but I don't think > "obsolete" SSTables should be relevant, because she's not doing UPDATE or > DELETE and she hasn't changed cluster topography (the "cleanup" case). Even a workload strictly limited to writing new data with unique keys and column names will cause sstables to become obsolete, as long as you write enough data to reach the compaction threshold. Just to be clear, my perhaps misuse of the term "obsolete" only refers to sstables that have been successfully compacted together with others into a new sstable which replaces the old ones (hence making them "obsolete"). I do not mean to imply that they contain obsolete columns. -- / Peter Schuller