> Sent: Thursday, October 29, 2015 at 1:53 PM
> From: "Simon Slavin" <slavins at bigfraud.org>
> To: "SQLite mailing list" <sqlite-users at mailinglists.sqlite.org>
> Subject: Re: [sqlite] How would sqlite read this from disk?
>
> 
> On 29 Oct 2015, at 5:20pm, Jason H <jhihn at gmx.com> wrote:
> 
> > Thanks, this is some really great information!
> 
> You're welcome.
> 
> > If I could ask a followup question. You made the statement "SQLite reads 
> > that row of the table from storage, from the first column to the last 
> > column needed by the SELECT, but perhaps not all the way to the end of the 
> > columns in the row.", Given an example select that requires columns 2,3,5, 
> > does this mean that column 4 is also read, or is column 4 be skipped? I 
> > guess the question is I am assuming that a row is serialization of one or 
> > more strings that are a serialization of a string preceded by their lengths 
> > (i.e. [L|data] ), would SQLite do something akin to (using my 2,3,5 
> > example):
> > begin row read, read L1, seek L1, read L2, read L2*byte, read L3, read 
> > L3*byte, read L4, seek L4, read L5, read L5*byte
> > or would the 'read L4, seek L4' be changed to 'read L4, read L4*byte' ?
> 
> You are missing some useful information.  Whenever SQLite has to change even 
> one variable-length column in a row it rewrites the whole row.  It has no way 
> of storing a short column in the space originally used by a longer one.  So 
> even if all you do is
> 
> UPDATE t SET c5='short string' WHERE c5='the original extremely long string'
> 
> the new copy of any of those rows has the new c6 immediately following the 
> new c5.  If you've read to the end of c5 then you're already pointing to c6 
> so there's no need to seek it.  Contriwise, the only way to seek c6 is to 
> read all the columns before it.  There is no pointer to the beginning of c6 
> and no magic mark that says "Column c6 starts here".
> 
> I'll answer your specific question (I hope) with an example.
> 
> CREATE TABLE t (c1, c2, c3, ... c8);
> CREATE INDEX t_23 ON myTable (c2, c3);
> SELECT c1, c3, c5, c6, FROM t WHERE c2='Smith' ORDER BY c3;
> 
> Given the above, the clauses in the SELECT can be perfectly satisfied by the 
> index, but that index doesn't supply the c1, c5 or c6 needed by the SELECT.
> 
> First SQLite figures out which rows, are needed to answer the query.  And in 
> doing that it decides to use index t_23 which means that for each of those 
> rows it already has the value of c3.  So knows which rows to read, and it 
> knows that it still needs c1, c5 and c6 from each of those rows.  The rows go 
> up to c8, but it doesn't need any column past c6.
> 
> So for each row needed, SQLite is pointed to the beginning of c1 for that 
> row.  There's no way of knowing where to find c5 without reading all the 
> preceding columns, because any of those may be any length.  So it progresses 
> through all the columns from c1 to c6 using the length-indicator to skip them 
> or read them depending on whether it needs their values.  It knows it can 
> stop after c6 so it stops once it has seen that one.
> 
> Hope that helps.  If you want the fine detail about the file format, you can 
> find it here:
> 
> <https://www.sqlite.org/fileformat2.html>
> 
> though that won't give you the algorithm details I've described above.

This seems to be at odds with my understanding of "2.1 Record Format" of the 
document. Given that it reads the row varint, which contains the length of 
itself and the serial types of the columns, in order, it should be completely 
able to skip the reading of unneeded columns. If I have 39,39,39 as my header 
(excepting the length of header) I know I am dealing with 3 text columns of 13 
bytes each. If I want only the last column, I can then seek(78), read(26). At 
no time did I need to read the prior two columns.

Or did I misunderstand something?

Many thanks, yet again.


Reply via email to