"does it typically have to read in the entire SStable into memory (assuming the bloom filter said yes)?" --> No, it would be perf killer.
On the read path, after Bloom filter, Cassandra is using the "Partition Key Cache" to see if the partition it is looking for is present there. If yes, it gets the offset (from the beginning of the SSTable) to skip a lot of data and move the disk head directly there If not, it then relies on the "Partition sample" to move the disk head to the nearest location of the sought partition If compaction is on (by default), there will be another step before hitting disk: compression offset. It's a translation table to match uncompressed file offset / compressed file offset On Wed, Sep 24, 2014 at 10:07 PM, Donald Smith < [email protected]> wrote: > We’re using cassandra as a key-value store; our values are small. So > we’re thinking we don’t need much disk readahead (e.g., “blockdev –getra > /dev/sda”). We’re using SSDs. > > > > When cassandra does disk seeks to satisfy read requests does it typically > have to read in the entire SStable into memory (assuming the bloom filter > said yes)? If cassandra needs to read in lots of blocks anyway or if it > needs to read the entire file during compaction then I'd expect we might as > well have a big readahead. Perhaps there’s a tradeoff between read > latency and compaction time. > > > > Any feedback welcome. > > > Thanks > > > > *Donald A. Smith* | Senior Software Engineer > P: 425.201.3900 x 3866 > C: (206) 819-5965 > F: (646) 443-2333 > [email protected] > > > [image: AudienceScience] > > >
