On 29 Oct 2015, at 5:20pm, Jason H <jhihn at gmx.com> wrote:

> Thanks, this is some really great information!

You're welcome.

> If I could ask a followup question. You made the statement "SQLite reads that 
> row of the table from storage, from the first column to the last column 
> needed by the SELECT, but perhaps not all the way to the end of the columns 
> in the row.", Given an example select that requires columns 2,3,5, does this 
> mean that column 4 is also read, or is column 4 be skipped? I guess the 
> question is I am assuming that a row is serialization of one or more strings 
> that are a serialization of a string preceded by their lengths (i.e. [L|data] 
> ), would SQLite do something akin to (using my 2,3,5 example):
> begin row read, read L1, seek L1, read L2, read L2*byte, read L3, read 
> L3*byte, read L4, seek L4, read L5, read L5*byte
> or would the 'read L4, seek L4' be changed to 'read L4, read L4*byte' ?

You are missing some useful information.  Whenever SQLite has to change even 
one variable-length column in a row it rewrites the whole row.  It has no way 
of storing a short column in the space originally used by a longer one.  So 
even if all you do is

UPDATE t SET c5='short string' WHERE c5='the original extremely long string'

the new copy of any of those rows has the new c6 immediately following the new 
c5.  If you've read to the end of c5 then you're already pointing to c6 so 
there's no need to seek it.  Contriwise, the only way to seek c6 is to read all 
the columns before it.  There is no pointer to the beginning of c6 and no magic 
mark that says "Column c6 starts here".

I'll answer your specific question (I hope) with an example.

CREATE TABLE t (c1, c2, c3, ... c8);
CREATE INDEX t_23 ON myTable (c2, c3);
SELECT c1, c3, c5, c6, FROM t WHERE c2='Smith' ORDER BY c3;

Given the above, the clauses in the SELECT can be perfectly satisfied by the 
index, but that index doesn't supply the c1, c5 or c6 needed by the SELECT.

First SQLite figures out which rows, are needed to answer the query.  And in 
doing that it decides to use index t_23 which means that for each of those rows 
it already has the value of c3.  So knows which rows to read, and it knows that 
it still needs c1, c5 and c6 from each of those rows.  The rows go up to c8, 
but it doesn't need any column past c6.

So for each row needed, SQLite is pointed to the beginning of c1 for that row.  
There's no way of knowing where to find c5 without reading all the preceding 
columns, because any of those may be any length.  So it progresses through all 
the columns from c1 to c6 using the length-indicator to skip them or read them 
depending on whether it needs their values.  It knows it can stop after c6 so 
it stops once it has seen that one.

Hope that helps.  If you want the fine detail about the file format, you can 
find it here:

<https://www.sqlite.org/fileformat2.html>

though that won't give you the algorithm details I've described above.

Simon.

Reply via email to