> Sent: Thursday, October 29, 2015 at 12:10 PM
> From: "Simon Slavin" <slavins at bigfraud.org>
> To: "SQLite mailing list" <sqlite-users at mailinglists.sqlite.org>
> Subject: Re: [sqlite] How would sqlite read this from disk?
>
> 
> On 29 Oct 2015, at 2:29pm, Jason H <jhihn at gmx.com> wrote:
> 
> > In college databases, we calculated the estimated number of blocks 
> > (512-byte device blocks) read given schema and statistics. Now, I'm asking 
> > how would SQLite actually do that?
> 
> SQLite uses a block structure in its database files, but one file block is 
> not always the same size as a storage-subsystem block.  If you're okay with 
> that, then the answer is to do it at the VFS level as Dominique described.
> 
> > I am particularly interested in knowing if SQLite can skip the reading of 
> > columns (and their disk blocks) not used in the query. that is, would Q1 
> > read on order of 128 blocks, or 16384 blocks (the whole table). And in Q2, 
> > 256 blocks, or 16384 blocks (the whole table again, irrespective of the 
> > previous statement and caches). Under what scenarios does the behavior 
> > change (indexes?)
> 
> From your question it seems that you believe that SQLite stores the data for 
> many rows in a column in the same block.  This is not the case.  SQLite 
> stores data in tables in one contiguous section per row of the table.  So the 
> data for one row, not one column, is stored in one chunk (possibly all in the 
> same block), with column 1 first, then column 2, etc..  Like this
> 
> A table
>       A row
>               column 1
>               column 2 ...
>       A different row 
>               column 1
>               column 2 ...
>       A different row ...
> 
> Let's assume that your SELECT's WHERE and ORDER BY clauses can be perfectly 
> served using an index -- either the primary key index for the table or 
> another index created using CREATE INDEX.  When this is the case, data from 
> the table is needed only once the right row has been found, and the columns 
> needed are detailed in the part between SELECT and FROM.  SQLite reads that 
> row of the table from storage, from the first column to the last column 
> needed by the SELECT, but perhaps not all the way to the end of the columns 
> in the row. [1]
> 
> So when creating a table you would theoretically get an improvement in 
> efficiency if you put the columns in decreasing order of use.  In real life 
> situations this tends to have significant effect only when you put long text 
> columns at the end of the rows.
> 
> Something similar is done when the SELECT's WHERE and ORDER BY clauses can't 
> be perfectly served using an index.  In that case the columns needed for 
> those clauses needs to be read from the table as well as the columns needed 
> by the part between SELECT and FROM.
> 
> Simon Slavin.
> 
> [1] Columns whose values are mentioned in the index used do not need to be 
> read from the table since those values are already known.  And there are some 
> details I've left out for brevity.


Thanks, this is some really great information!

If I could ask a followup question. You made the statement "SQLite reads that 
row of the table from storage, from the first column to the last column needed 
by the SELECT, but perhaps not all the way to the end of the columns in the 
row.", Given an example select that requires columns 2,3,5, does this mean that 
column 4 is also read, or is column 4 be skipped? I guess the question is I am 
assuming that a row is serialization of one or more strings that are a 
serialization of a string preceded by their lengths (i.e. [L|data] ), would 
SQLite do something akin to (using my 2,3,5 example):
begin row read, read L1, seek L1, read L2, read L2*byte, read L3, read L3*byte, 
read L4, seek L4, read L5, read L5*byte
or would the 'read L4, seek L4' be changed to 'read L4, read L4*byte' ?

Many thanks again!


Reply via email to