> 2) Rad from disk all row keys, in order to find one (binary search) No. At startup cassandra samples the -index.db component every index_interval keys. At worst index_interval keys must be read from disk.
> As I understand, in the worst case, we can have three disk seeks (2, 4, 6) > pro SSTable in order to check whenever it contains given column, it that > correct ? It depends on the size of the row. For a small (less than column_index_size_in_kb) size row it's to get a specific column it's : * 1 seek in index.db * 1 seek in data.db > I would expect, that sorted row keys (from point 2) ) already contain bloom > filter for their columns. But bloom filter is stored together with column > index, is that correct? Yes Hope that helps. ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/08/2012, at 7:31 PM, Maciej Miklas <[email protected]> wrote: > Great articles, I did not find those before ! > > SSTable Index - yes I mean column Index. > > I would like to understand, how many disk seeks might be required to find > column in single SSTable. > > I am assuming positive bloom filter on row key. Now Cassandra needs to find > out whenever given SSTable contains column name, and this might require few > disk seeks: > 1) Check key cache, if found go to 5) > 2) Rad from disk all row keys, in order to find one (binary search) > 3) Found row key contains disk offset to its column index > 4) Read from disk column index for our row key. Index contains also bloom > filter on column names > 5) Use bloom filter on column name, to find out whenever this SSTable might > contain our column > 6) Read column to finally make sure that is exists > > As I understand, in the worst case, we can have three disk seeks (2, 4, 6) > pro SSTable in order to check whenever it contains given column, it that > correct ? > > I would expect, that sorted row keys (from point 2) ) already contain bloom > filter for their columns. But bloom filter is stored together with column > index, is that correct? > > > Cheers, > Maciej > > On Fri, Aug 17, 2012 at 12:06 AM, aaron morton <[email protected]> > wrote: >> What about SSTable index, > Not sure what you are referring to there. Each row has a in a SStable has a > bloom filter and may have an index of columns. This is not cached. > > See http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ or > http://www.slideshare.net/aaronmorton/cassandra-sf-2012-technical-deep-dive-query-performance > >> and Metadata? > > This is the meta data we hold in memory for every open sstable > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/SSTableMetadata.java > > Cheers > > > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 16/08/2012, at 7:34 PM, Maciej Miklas <[email protected]> wrote: > >> Hi all, >> >> bloom filter for row keys is always in RAM. What about SSTable index, and >> Metadata? >> >> Is it cached by Cassandra, or it relays on memory mapped files? >> >> >> Thanks, >> Maciej > >
