Hi,

On Feb 21, 2013, at 7:52 , Kanwar Sangha <kan...@mavenir.com> wrote:
> Hi – Can someone explain the worst case IOPS for a read ? No key cache, No 
> row cache, sampling rate say 512.
>  
> 1)      Bloom filter will be checked to see existence of key (In RAM)
> 2)      Index filer sample (IN RAM) will be checked to find approx. location 
> in index file on disk
> 3)      1 IOPS to read the actual index file on disk (DISK)
> 4)      1 IOPS to get the data from the location in the sstable (DISK)
>  
> Is this correct ?

As you were asking for the worst case, I would still add one step that would be 
a seek inside an SSTable from the row start to the queried columns using column 
index.

However, this applies only if you are querying a subset of columns in the row 
(not all) and the total row size exceeds column_index_size_in_kb (defaults to 
64kB).

So, as far as I have understood, the worst case steps (without any caches) are:

1. Check the SSTable bloom filters (in memory)
2. Use index samples to find approx. correct place in the key index file (in 
memory)
3. Read the key index file until correct key is found (1st disk seek & read)
5. Seek to the start of the row in SSTable file and read row headers (possibly 
including column index) (2nd seek & read)
6. Using column index seek to the correct place inside the SSTable file to 
actually read the columns (3rd seek & read)

If the row is very wide and you are asking for a random bunch of columns from 
here and there, the step 6 might even be needed multiple times. Also, if your 
row has spread over many SSTables, each of them needs to be accessed (at least 
steps 1. - 5.) to get the complete results for the query.

All this in mind, if your node has any reasonable amount of reads, I'd say that 
in practice key index files will be page cached by the OS very quickly and thus 
normal read would end up being either one seek (for small rows without the 
column index) or two (for wider rows). Of course, as Peter already pointed out, 
the more columns you ask for, the more disk needs to read. For a continuous set 
of columns the read should be linear, however.

-Jouni

Reply via email to